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Set Theory 1 


1 Historical Roots 


Although in retrospect others (Bernard Bolzano, Richard Dedekind) can be 
viewed as precursors, set theory was largely the creation of a single individual, 
Georg Cantor, beginning in the 1870s, and his key work (Cantor, 1915) remains 
highly readable to this day. He launched the field with two results on questions 
with ancient roots. 


1.1 Strings to Ordinals 


Pythagoreans noted that if the lengths of otherwise similar strings are in the ratio 
2:1, the shorter sounds an octave higher. Why? Because it vibrates twice as 
quickly. In modern mathematical language, if the graph of the displacement of the 
center of the string with time approximates y = cosx for the longer, it will 
approximate y = cos 2x for the shorter. No real string vibrates so simply, and a 
better approximation for the long string would be vy = a; cos + az cos 2x, with 
the amplitude a, of the “fundamental” much larger than the amplitude a2 of the 
“overtone.” By the eighteenth century, workers in analysis, the branch of math- 
ematics beginning with calculus, were dealing with infinite trigonometric series: 


y = (a; cosx + by, sinx) + (az cos 2x + bp sin 2x) + (a3 cos 3x + b3 sin3x) +... 


The “vibrating string controversy” engaging Leonhard Euler and others con- 
cerned how wide a class of functions can be represented in this form. The dispute 
exposed, beyond endemic deficiencies of rigor in the treatment of infinite series, 
lack of a common understanding about what is meant by a function. The ensuing 
nineteenth-century rigorization of analysis, besides banning any literal infinities 
or infinitesimals, explaining contexts containing the symbol oo without assuming 
it to denote anything in isolation, fixed on the maximally general notion of 
function, under which any correlation between inputs and outputs counts, as 
long as there is one and only one output per input. Improved rigor eventually led 
to consensus about the existence of trigonometric series representations. 

But with existence there come uniqueness questions. Could a function have 
two different representations? Does the constant function zero have any other 
than the trivial one with a, = b, = 0 for all n? Bernhard Riemann showed it 
does not if the sequence converges for all x. But what if one allows an 
exceptional point for which convergence is not assumed? Enter Cantor. It 
turns out that even then triviality holds (and, as a conclusion, we get what we 
did not assume as a premise, convergence even at the exceptional point). 
Indeed, one can allow two or any finite number of exceptional points. One 
can even allow infinitely many as long as they are all isolated from one another, 
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meaning that for each exceptional x there is a positive ¢ with no other excep- 
tional points between x — ¢ and x + ¢. One can even allow a doubly exceptional 
point, not isolated from other exceptional points. Indeed, one can allow two or 
any finite number. One can even allow infinitely many as long as they are 
isolated from one another. One can even allow a triply exceptional point. And so 
on. And as one goes on, it becomes natural to switch from speaking in the plural 
of the exceptional points to speaking in the singular of the set E of which they 
are elements. What it means to treat E as a single item is to think of operations 
being applicable to it. The relevant operation on sets Cantor called derivation, 
discarding isolated points. Let Eo be F itself, and let E,,4, be the derived set of 
E,,. Reimann’s result was that uniqueness holds if Ey = @, the empty set, with 
no elements. Cantor’s results were that uniqueness holds if any of 
E\, Ex, E3,...i1s empty. Moreover, if we let E,, be the intersection of the E,,, 
the set of x belonging to all of them, uniqueness still holds if Ey = @. 
Moreover, the results continue, with sets indexed by: 


o+1,0+2,04+3,...0+0=0-2,0:3,0-4,...0:0=07,0,',...0° 


and more. Here are Cantor’s transfinite ordinal numbers, and, as the notation 
suggests, he introduced an arithmetic for them, with addition, multiplication, 
and exponentiation. 


1.2 Quadrature to Cardinals 


Euclid shows many geometrical figures can be constructed with straightedge 
and compass, indicating the steps involved and proving they lead to the desired 
result. Thus one can duplicate the square, or construct, given the side of a 
square, the side of a square of twice the area, just by taking the diagonal of the 
original square. To show a construction not possible is more difficult, and 
requires an analysis available only with the modern coordinate methods, 
which transform geometric into algebraic problems. Thus duplicating the 
cube, constructing, given the side of a cube, the side of a cube of twice the 
volume, turns out equivalent to obtaining a key number, \/2, from rational 
numbers by addition, subtraction, multiplication, division, and extraction of 
square roots. And this was proved impossible in the 1830s, disposing of an 
ancient problem. For quadrature of the circle, constructing for a given circle a 
square of equal area, the key number is z. Now, although V2 is not obtainable in 
the way indicated, it is at least an algebraic number in the sense of a solution to a 
polynomial equation: 


nx" + dy_1x” | +... +aix tap =0 
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with rational coefficients a;, namely, x32 = 0. It was conjectured, however, 
that 2 is not even algebraic in this sense. Joseph Liouville showed nonalgebraic 
or transcendental numbers exist. Then e, the basis of the natural logarithms, was 
shown to be one by Charles Hermite, and, finally, m by Ferdinand von 
Lindemann. Between these last two, Cantor showed that the vast majority of 
real numbers are transcendental. 

Since the sets of algebraics and transcendentals are infinite, to say one has 
more elements than the other requires a definition of when the transfinite 
cardinal, or number of elements of one infinite set, A, is equal or unequal to 
that of another, B. Cantor took as his standard of equality the existence of a 
bijection between A and B, a relation under which each element of A is 
associated with exactly one element of B, and vice versa. In the case of the set 
N of natural numbers, the existence of a bijection with a set B means that the 
elements of B can be enumerated or listed in a sequence indexed by 0, 1, 2, ..., 
as in Table |. An infinite set whose elements can be so enumerated is called 
denumerable, while a set that is either denumerable or finite is called countable. 

The number of elements of a denumerable set Cantor called Xo (pronounced 
“aleph nought”). What the table shows is that signed integers and positive 
rationals both have cardinal or size No; so do the signed rationals. Nowadays, 
a finite sequence of keystrokes is transmitted electronically as a sequence of 
zeros and ones, the binary numeral for some natural number that may be 
considered a code for the sequence. This makes the set of such sequences 
denumerable, in order of increasing code number. Then, since a polynomial 
equation of degree n has at most 7 solutions, each algebraic number can be 
denoted by an expression such as “the second smallest solution to 
2x3 — 9x7 — 6x + 3 = 0” and given a code number accordingly. But their 
denumerability was established in correspondence between Dedekind and 
Cantor long before the digital age began. 

By contrast, Cantor showed that the whole set R of real numbers (and hence 
the set of transcendentals, left over when we remove the algebraics) is not 
denumerable. No countable set can contain even just those whose decimal 


Table 1 Denumerable sets 


Set Enumeration 
Natural numbers 0 1 2 3 4 5 6 7 8 


Integers 0 1 -1 2 =—2 3 3 = 464 —4 
Positive rationals 1/1 1/2 2/1 1/3 2/3 3/2 3/1 1/4 3/4 
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4 Philosophy and Logic 
Table 2 The diagonal argument 


Index Zero-one sequence 


0 o* 0 0 0 0 0 0 0 0 
| 1 1* 1 1 1 1 1 1 1 
2 0 1 0* 1 0 1 0 1 0 
3 1 0 1 0* 1 0 1 0 1 


expansion involves only 0s and 1s; or what is the same, all infinite zero-one 
sequences; or what is the same, all sets of natural numbers, each such being 
representable by the zero-one sequence with one in the nth place if and only if n 
is in the set. This he established by his famous diagonal argument. Suppose we 
have an enumeration of some set S of infinite zero-one sequences, as in Table 2. 
Go down the diagonal, marked with asterisks. Take in order for each n the digit 
appearing in the nth place in the nth row of the table. This gives 0100... . Now 
swap the zeros and the ones. This gives 1011 ..., a sequence that does not 
belong to the denumerable set S, since it differs in the nth place from the nth 
sequence. Cantor called the cardinal of the real numbers or points of the line c. 
Analogously to the results in Table 1 in this discussion, he showed that the 
positive real numbers, or even just those in a finite interval, also have cardinal c, 
as do pairs of real numbers, or equivalently complex numbers. He also intro- 
duced an arithmetic, with addition, multiplication, and exponentiation, for his 
cardinals. 

Cantor’s audacious introduction of m and & when mathematicians had just 
finished explaining away © provoked a reaction. But Cantor’s theory won 
acceptance among leaders in the rising generation fairly quickly (as examples 
they put forth, such as the one-, two-, and three-dimensional Cantor set, 
Sierpinski carpet, and Menger sponge, whose images appear all over the 
Internet today, captured the imagination of amateurs). The leading mathemat- 
ician David Hilbert insisted: “No one shall expel us from the paradise Cantor 
created for us.” 


2 The Notion of Set 


Many objections turned on certain paradoxes. Cantor, unlike his contemporary 
Gottlob Frege, never made the assumptions that led to these paradoxes, but he 
did not make clear enough what assumptions he was making. His successors 
had to be more clear and explicit. Explicit axiomatization began in the first 
decade of the twentieth century with Ernst Zermelo (1908/1967). His system, 


Downloaded from https://www.cambridge.org/core. IP address: 154.28.188.203, on 03 Feb 2022 at 10:51:02, subject to the Cambridge Core terms 
of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781 108981828 


Set Theory 5 


with additions and amendments, mainly by Abraham Fraenkel (1922/1967), 
remains that accepted today, when it is recognized that the paradoxes result 
mostly from confusing the notion of set behind the axioms of Zermelo—Fraenkel 
set theory with Choice (ZFC) with other ideas. 


2.1 Collections 


The expression “a multiplicity of objects” begins singular but ends plural, and 
may be understood as referring either to a plurality, a many, or to a universal, a 
one as opposed to a many. Universals include properties, which are intensional, 
meaning that two may be different even while having exactly the same 
instances, as with the stock example being a coin in my pocket and being a 
penny in my pocket, which are distinct properties even if I have no coins in my 
pocket but pennies. They also include aggregates completely determined by 
their components. One kind, topic of a theory called mereology, is a fusion of a 
plurality of component parts into a single whole, in a way that permits different 
pluralities to have the same fusion, as do the eight ranks and the eight files of a 
chessboard, the fusion being the selfsame chessboard in either case. By contrast 
we have collections, in which many are gathered into a one without losing track 
of which many they were. 

The notion of collection in Frege (1893) was that of an extension. Here we 
start with all objects, and take what he called a concept (associated with a 
predicate), and divide objects into those that fall under the concept (satisfy the 
predicate) and those that do not. The collection of those that do is the extension 
of the concept, so that the extensions of two concepts are the same if and only if 
the concepts are coextensive, having exactly the same things falling under them. 
Graphically, we may represent the unbounded range of all objects with which 
we start as an unbounded blank page, and represent the extension as given by a 
dividing line or curve separating objects inside from objects outside, as in 
Figure 1. But for Frege, the extension is itself an object: If represented by a 
dot, that dot must fall on the page on one side or the other of the division — but 
which? That is the question indicated by the question marks in the figure. 

Bertrand Russell raised an embarrassing issue about the extension R of the 
concept: it is an extension that as an object is outside, not inside, itself. In the 
case of the universal extension, V, the extension of is self-identical, V is inside 
itself since everything is inside V. In the case of the empty extension @, the 
extension of nonself-identical, © is outside itself since nothing is inside @. 
Hence @ is inside, and V is outside, the Russell extension R. But just as the 
statement this very statement is false seems to be true if it is false and false if it is 
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Figure 1 An extension 


Out 


Figure 2 An ensemble 


true, so R seems to be inside itself if outside itself, and outside if inside. This is 
the Russell paradox as Russell (1902) put it to Frege. 

Contrasting with this inconsistent “top down” notion of extension is the 
“bottom up” notion of an ensemble. Here we start with a given “universe of 
discourse,” which might be represented by a box, and a predicate will, like a curve 
ina Venn diagram, mark off the ensemble of things in the universe that do satisfy 
it from things in the universe that do not. The ensemble does not, however, itself 
belong to the universe. A dot representing it would lie outside the box, as in 
Figure 2. Implicit here is the possibility of iteration. We can add a new box atop 
the original, to accommodate all the dots representing ensembles of things in the 
lower box, and then more. But there are two ways to implement this idea. 

On the /ayered approach of the theory of types, deriving from Russell (1908) 
by way of Frank Ramsey (1925), we have a hierarchy with individuals at the 
bottom type zero, collections called classes of type zero items at type one, 
classes of type one items at type two, and so on. Even if we assume no items at 
type zero, there will be one item at type one, the empty class @, of type zero 
items, and then two items at type two, the empty class @, of type one items, and 
the singleton class { ©}, of the one item at type one. At type three, there will 
be four items, as in Table 3. With one item at type zero, there will be two at type 
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Table 3 The layered hierarchy 


4 Sixteen Items 

3 B53, {oho {Py hog, {Pol hohe 
2 G,, {9,}. 

1 8, 

0 No Items 


one, then four, then sixteen. But with only finitely many individuals, there will 
only ever be only finitely many items of any one type. For mathematical 
purposes, Russell assumed infinitely many individuals. 


2.2 Sets 


By contrast, we have the cumulative approach, where successive boxes are 
nested, like Chinese boxes or Russian dolls, each higher one adding a new level 
of collections called sets. In box zero are individuals or Urelemente; at level 
one, sets whose elements are individuals; in box one, individuals and level-one 
sets; at level two, any new sets whose elements come from box one; in box two, 
box-one and level-two items; and so on. 

In ZFC, we consider only pure sets, without individuals. There then will be 
no items at level zero, one item, the empty set @, at level one, in box one. As for 
level two, from the one item in box one can be formed two sets: the empty set @ 
and its singleton { © }, but the former we already have, so only the latter is new. 
In box three will be four items, two new at level three. In box four will be sixteen 
items, twelve new at level four. And so on, as in Table 4. 

After all finite levels, we may recognize a box @ containing everything of 
finite level but nothing new, and then form a level @ + 1 for sets whose elements 
come from level @, meaning from any finite level, but do not themselves appear 
at any such level, containing as they do sets of arbitrarily high finite level. We 
can then continue through the transfinite ordinals. Zermelo at first claimed for 
his axioms only that they permitted none of the known deductions of contradic- 
tions, and seemed adequate to develop Cantor’s set theory (as they are with 
Fraenkel’s friendly amendments). Only later (as in Zermelo, 1930) did some- 
thing like the picture in the table emerge. 

The ideal of rigor is that one should list in advance all primitives, notions 
assumed meaningful without definition, and postulates or axioms, results 
assumed true without demonstration, and given these principles all further 
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Table 4 The cumulative hierarchy 


o+1 {O, {OD}, {{O}}, {{{O}}}, ...} and Many Other New Items 
o No New Items 
4 Twelve New Items 
3 {{O}}, {9.{O}} 
2 {Q} 
1 @ 
0 No Items 
Table 5 Primitive logical notions 
Symbol Operation Reading 
a Negation “not” 
A Conjunction “and” 
Vv Disjunction “or? 
Vv Universal quantification “for all” 
4 Existential quantification “for some” or “there exists” 


notions or results should be logically derived, by definition or deduction. In set 
theory, there is just one primitive, written with a stylized epsilon symbol, x € y, 
read “x is an element of y” or “x is in y” or “y contains x.” All other notions must 
be defined in terms of this and the logical notion of identity using the logical 
operators in Table 5. A formula ® is built up from atomic formulas x € y and 
x = y using the five operations in the table. 

Some minimal familiarity with logical notions and notations must be 
assumed here (for a quick review, see Boolos, Burgess, and Jeffrey, 2002, 
chapters 9 and 10), including an ability to recognize simple logical laws. In 
particular, familiarity is assumed with the distinction between “free” and 
“bound” occurrences of variables in a formula, those that are not and those 
that are caught by a quantifier. For example, in the formula asserting the non- 
emptiness of x, namely 4y(y € x), the x is free but the y is bound. The latter could 
be changed to z without changing the meaning. Other logical and set-theoretic 
notions may be defined in terms of what we have so far, as in Tables 6 and 7, but 
officially these are mere abbreviations. 
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Table 6 Defined logical notions 


Abbreviation Definition Operation Reading 
ad. A®VY Conditional “if ® then ‘P” 
o= (D> VP) A(¥DO) _ Biconditional “® if and only if ¥” or 
“@ iff B” 
x#y ax=y Nonidentity “x is distinct from y” 
Alx@(x) AxVy(P(y)=x=y) Unique “there exists a unique” 
existence 


Table 7 Defined set-theoretic notions 


Abbreviation Definition Reading 

xéy axEy “x is not an element of 
[or not in] y” 

xcy Vz(zExD zEy) “x is a subset of [or 
included in] y” 

Yxey B(x) Yx(x EyD O(x)) “for allx iny...” 

AxEy P(x) Ax(x € yA®(x)) “for some x iny...” 


3 The Zermelo—Fraenkel Axioms 


The axioms of the system ZFC will be presented next, in both words and 
symbols, to be assumed without proof, but not without something in the way 
of informal, intuitive justification. 


3.1 Statement 


The first axiom says sets with the same elements are the same. It has two 
equivalent formulations: 


Extensionality (1) Vz(z€x=zey) Dx=y, (2) xCvyAyCxDx=y. 


By convention, in displaying formulas initial universal quantifiers are omitted, 
so what is meant is really VxVy(__) where what is explicitly written is _. As (2) 
suggests, proofs of identities most often come in two parts, proving inclusion in 
two directions. Extensionality implies that if there is a set y whose elements are 
all and only the sets x satisfying a condition ®, it is unique. That unique set, if it 
exists, is denoted {x| O(x)}, and we have z € {x| ®(x)} if O(z). Frege’s incon- 
sistent assumption would be an axiom of comprehension, according to which 
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{x| ®(x)} always exists for any condition ®. Applied to the condition x ¢ x this 
would give the Russell paradox, and it is not assumed in ZFC. 

The second axiom says that if we already have some set u, we can at least 
separate out from uw those of its elements that satisfy a condition ® to form 
{x €u|O(x)} : 


Separation AyVx(x €y=(xE uA P(x))). 


This is not a single formula, but rather a rule to the effect that anything of a 
certain form counts as an axiom. The cases for different © are called instances of 
the scheme of separation. (Zermelo’s original formulation was vaguer.) Note 
that separation implies there is no universal set of all sets V = {x|x = x}. If 
there were, we could, by separation, obtain comprehension. 

Further axioms state the existence of certain specific sets: 


Pairing AJy(ueyAvey). 
Union ayVz EX Vx €2z(x Ey). 


With what we have so far, some basic existence results then become deducible, 
those in Table 8. (The expression “family” used in the table may be used for any 
set of sets.) 

Separation gives us the empty set, since given any set u at all — and even 
pure logic assumes there is at least one item in the domain our quantifiers 
range over, which in the present case consists of sets — separation gives 
{x €u|x¢ u}, which is empty. It also gives twofold intersections, and by the 
alternative definition, family intersections, if the family X has at least one 
member u; also differences. Now given y containing wu and v, we can separate 
out the elements of y identical to one of those two, so pairing with separation 
gives the unordered pair. Union with separation gives us family union. The 
unordered triple and twofold union we then get using the alternative definitions. 
The difference u—v is also called the relative complement of v in u. An 
absolute complement —v = {x|x¢v} cannot exist, because vU — v would be 
the nonexistent V. 

The next two axioms are these: 


Power AyVx(x Cu Dx Ey). 
Infinitity Ay(OEeyAVx Ey({x}Ey)). 


Power with separation gives the power set P(x) = {y|y Cx} and also 


{yCx| ®Y)} = ye P(X)|O0)}. 
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Table 8 More defined notions 


878186801 1826/2 LOL OL/Bs0°lop//:sdyy *swayas02/bio'abpuquies mmumy/:sdyqy ye ajqeyiene ‘asn jo 


SUW19} 8405 aHplaquey au} 0} Walqns ‘Z0:1S:01 1€ ZZOZ Jed EO UO ‘E07'SBL'8Z'PSL :sSauppe d] *8109/Hso'abpquuiesmMMmw//:sdyYy Wood, papeojumoq 


Name Symbol Definition Alternatively 

Empty set 7) {x|x 4 x} 

Singleton {u} {x|x =u} {u, u} 

Unordered pair {u, v} {x|x =u Vx=v} 

Unordered triple {u, v, w} {xjx=uVx=vVx=w} {u,v} U{w} 

Twofold untersection uNy {x)x=u Ax=v} {x€u|x ev} 

Family intersection nx {x|Vz(zEX D x €z)} {xEu|Vz (zEeXDxEz)} 
Twofold union uUv {x|x=uVx=v} U {u, v} 

Family union UX {x| 4 z(zEX Axez)} 

Difference u—v {x|xeu A x€v} {x€u|xév} 
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Figure 3 Partition and selector 


Infinity guarantees the existence of a set that contains all of @ and { @} and 
{{ @ }} and so on, and hence is infinite; alternative formulations are possible; 
more detailed discussion is postponed. 

Also postponed is detailed discussion, beyond its mere statement, of the 
widely known axiom of choice (AC), pictured in Figure 3. 

Sets whose intersection is nonempty are said to meet or overlap; those whose 
intersection is empty are called disjoint, and a family any two members of which 
are disjoint is called pairwise disjoint, a family of nonempty, pairwise disjoint sets 
is called a partition (of its union), and the members of the family the ce//s thereof. 
Axiom of choice asserts that for any partition there is a selector, a set containing 
exactly one element from each cell (represented in the figure by the scattered 
dots). Alone of the axioms, AC asserts the existence of a set satisfying a certain 
condition, without given a definition of such a set as {x|®(x)} for any ®: 


Choice VX(VxEX(x 4 D)AVKEXWEX(x AV DxXNV=@D) 
DAY VxEXAWVE V(yex)). 


Fraenkel’s distinctive addition to Zermelo’s axioms, replacement, is a scheme 
saying that if to each element x of a set u there is associated a unique y satisfying 
acondition ®(x, y) —call it p(x) — we may replace each x in u by @(x) and form 
the set {@(x)|x € u}. Actually, it is enough to assume there is a set containing all 
(x) for x € uw and then apply separation to get the set of all and only the ¢(x) for 
x Eu. So, the new assumption we need is this: 


Replacement Vx €uad!ly®(x, y) D AVVxeuayEevO@(s, y). 


3.2 Motivation 
While “intuition” may not be appealed to in proofs of theorems, still where 


axioms are connected with an intuitive picture, it may at least suggest 
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conjectures, besides being a source of confidence in the consistency of an 
axiom system, beyond the mere inductive consideration that no contradiction 
has been found so far. For such reasons, interest attaches to the relationship 
between the axioms of ZFC (beyond extensionality) and the cumulative 
hierarchy picture. 

For separation, the nonexistence of a universal set V is clear, since the 
elements of a set that at a given level come from lower levels. By contrast, if 
a set x appears at a given level, then its elements all appear at lower levels, 
including such of them as satisfy some condition ®, and hence the set of all such 
will appear at a level no higher than that of x itself. 

For pairing, if « appears at some level and v at some level, one of these levels 
will have to be no earlier than the other, and both u and v will be present at that 
level, and so {u, v} should appear at the very next. For union, if X appears at 
some level, every element appears at some earlier level and every element of 
such an element at some still earlier, so all elements of elements will be present 
at levels below that of X, and the set UX present by the same level as X. For 
power, if u appears at some level, then we have seen all its subsets are present by 
that level, and so P(u) should appear at the very next. For infinity, it asserts no 
more than the existence of such a set as we see at level w+ 1 in Table 4 in 
Section 2.2. 

For choice, if a partition XY occurs at some level, it is easily seen any selector 
for it will appear by that same level. But is there any selector? The assumption 
that the hierarchy is maximally “wide,” admitting at a given level all sets that 
could conceivably be formed from elements at lower levels, means that we 
should not be imposing any requirement of definability as a precondition for set 
existence. Historically, objections to AC have generally rested on implicit 
imposition of some such precondition, so the cumulative hierarchy picture 
excludes the major antichoice argument. But that is not quite to say that it 
provides a substantive prochoice argument, and the axiom remains, to a degree, 
controversial. Although it is no longer common for working mathematicians to 
star theorems whose proof depends on AC, set theorists keep track. 

For replacement, many feel the understanding that the cumulative hierarchy 
is supposed to be maximally “high,” admitting a// levels that could conceivably 
be admitted, supports the axiom. But here the influence may be felt of what 
some would claim is a further thought, a doctrine of limitation of size, according 
to which all that can prevent a plurality of sets from being collected together into 
a set would be there being too many of them. (Cantor distinguished the 
inconsistent multiplicities that cannot be collected into a whole from the 
consistent ones that can by the formers’ being absolutely infinite where 
the latter are only transfinite.) The idea would be that in {@(x)|x Eu} there 
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would not be too many elements to form a set, since there would be no more 
than there are in u, which already is a set. See Boolos (1971) for critical 
discussion. 

There remains an axiom not always counted as part of ZFC — and, in 
particular, not so counted in at least one widely used introductory textbook — 
although so counted here. It has two equivalent formulations. 


Foundation (1) Vx(x 4 © D AyxVzx(z€y)), 
(2) Vx(x 4 DD Ayex(yNx=@)). 


In words, if x has any elements at all, then it has an element y that is epsilon 
minimal, meaning that there is no other element z with z € y. The axiom, which 
also goes by the alias regularity, is directly suggested by the cumulative 
hierarchy picture: If x has any elements, it must have an element y of lowest 
possible level for an element of x, and such a y will be epsilon minimal. Some 
immediate consequences: 


There is no set x with x € x. 
There are no sets x, y with x Ey ex. 
There are no sets x, y, z withxeEyezex. 


Why not? Because {x} or {x, y} or {x, y, z}, as the case may be, would have 
no epsilon-minimal element. The axiom also excludes the existence of any 
infinite descending chain with x;€ xo and x2 €x, and x3€x2 and so on. 
Alongside orthodox set theory ZFC, there exist heterodox “alternative” set 
theories. Incurvati (2020) surveys several, including two that permit infinite 
descending sequences: a “graph” conception due to Peter Aczel and a “stratified” 
conception due to W. V. Quine. (He also considers a “‘paraconsistent” conception 
that accepts comprehension and the Russell paradox, but adopts a deviant logic in 
hopes of quarantining the contradiction.) See also Holmes (2017). 


4 Immediate Consequences 


Some consequences of the axioms were established well before set theory 
became a separate subject. 


4.1 The Algebra of Sets 


An important step toward modern logic was taken by George Boole, whose 
Laws of Thought (1854) contains formulas in algebraic symbolism each admit- 
ting two readings: as a principle of logic and as what we recognize retrospect- 
ively as a one of set theory. Thus the formula a-b =b6-a expresses both 
the logical law of the commutativity of conjunction, ® A Y iff ¥ A ®, and 
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the set-theoretic law of the commutativity of intersection, xNy=yNMx. 
Suppose we are working for a time only with subsets of some given set J, and 
allow ourselves to write —x for J — x. Then the first batch of theorems of ZFC 
consists of equations of so-called Boolean algebra for \ and U and —. The 
proof of such an equation consists in applying extensionality after showing that 
any item will belong to the right side if it belongs to the left side; and the proof of 
that consists in unpacking the definitions and applying a law of logic, the very 
law that in Boole’s notation would be expressed by the same algebraical formula 
as the set-theoretic result we are trying to prove, thus: 


zexny iff zexAzey iff zeyAzex iff zeEynNx. 


Any number of further laws of the algebra of sets are found in Table 9, in pairs 
of “dual” laws on the same row. Another, not in the table, is the law — — x = x, 
corresponding to the law of double negation: —— © iff ®. 

Further laws involving inclusion appear in Table 10. Many of these laws may be 
familiar from school, where they might have been illustrated by Venn diagrams. 
Even an introductory textbook of set theory, although it might run to hundreds of 
pages, would leave the verification of most as “‘exercises for the reader’ — with the 
good excuse that, in any case, one can only really learn a mathematical subject by 
doing exercises — and in this much shorter Element, where the aim must be less to 
train the reader in than to inform the reader about a technical subject, they will a// be 
so left. (The proofs do not al// have to be proceed “elementwise,” as in the 
commutative law example. Once one has accumulated a few laws, others can be 
derived from them “algebraically,” without going back to the definitions.) 


4.2 The Algebra of Relations 


Boole’s logic covers a bit more than Aristotelian syllogistic, being a version of 
the modern logic of one-place predicates. It is still not enough to analyze 
serious mathematical arguments, which generally involve two-place predi- 
cates (such as €). The logic of many-place predicates in present-day textbooks 
derives from Frege (1879) conceptually, and Giuseppe Peano and others 
notationally, but even before them, there were attempts to develop predicate 
logic in Boole’s algebraic style. 

To incorporate relation theory into set theory we must identify relations with 
sets of some kind. The first step is to ignore the distinction between a relation R 
such as parent of and what is sometimes called the “graph” of the relation, the 
set of ordered pairs (a, b) with a a parent of b. We write Rab or aRb or (a,b) E R 
indifferently. The second step is to identify an ordered pair (a, b) with a set of 
some kind, most commonly using the Wiener—Kuratowski definition: 
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Table 10 More Boolean laws 


Name Symbolic statement 

Reflexivity xEexX 

Antisymmetry ifxCy A yOxDx=y 

Transitivity ifxCy A yOzDxCz 

Extrema Ocox A xCy 

Complementarity xCy=—-yC-x 

Lattice laws xCy=xNy=x xCy=xUy=y 


XCYAXCZDXCVNZ xXCZAYCZDxXUVCZ 


(a,b) = {{a}, {a, b}}. 


It would be idle to pretend this reveals what ordered pairs have been all along. 
It is an attempt to define something with all the features of ordered pairs needed 
for mathematics, without going beyond set theory. Its acceptability depends on 
prior analysis of just what is needed for mathematics. The consensus is that the 
existence for every a and b of a unique ordered pair (a, b), together with the 
following fundamental law of pairs, will do: 


Fundamental Law of Pairs (a,b) = (c,d) iff a=bAc=d. 


Given the Wiener—Kuratowski definition, the existence of the ordered pair 
follows by three applications of pairing to get {a} and {a, b} and then (a, 5). 
The proof of the fundamental law will be left as an exercise. One also needs, for 
any A and B, the existence of the Cartesian product: 


A@B={(a,b)|ac AAbEB} = {x| da € Aad E B(x = (a,b))}. 


There are two interestingly different proofs. The first begins by noting 
that we have already the existence of the union of 4 and B, of the power 
set of that union, and of the power set of the power set; while also each of 
{a} and {a,b} is a subset of the union and hence (a, b) defined the 
Wiener—Kuratowski way is a subset of its power set. Separation then 
gives what we want: 


A@B = {xCP(AUB)| 3a€ ASbE B(x = {{a}, {a,b}})}. 


The second begins by applying replacement twice to conclude: 
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Table 11 Defined relation-theoretic notions 


Name Symbol Definition 

Domain dom R {a| ab aRb} 

Range ran R {b| da aRb} 

Restriction RIC {(x, b)ER|xEC} 

Image RIC] {b| dx €C xRb} 

Inverse Ro {(b, a)| aRb} 

Composition ReS {(a,c)|4b€B (aRb A bRc)} 


{a} @B = {(a,b)|b€ B} exists for each a € A. 
{{a} @B| ae A} exists. 


Then A®B=U {{a} @B\ ae A} exists. 

There is a host of definitions that can now be made, some assembled in Table 
11. (it is traditional to illustrate some of them by kinship relations. Thus the 
inverse of parent of is child of and the composition of sister of and parent of is 
aunt of.) Note that in the table the a and the b in dom R and ran R will 
automatically belong to U U R under the Wiener—Kuratowski definition, so 
domain and range exist by separation. We leave the existence question in the 
other cases to the reader. There are other terms in use: the inverse of R is 
alternatively called the converse, while R~![D] is called the preimage of D, and 
dom R U ran R the field of R. 

Students of mathematics encounter these definitions gradually in the course 
of studying this or that branch of mathematics, rather than in a bloc in a separate 
course on set theory. Readers encountering the lot all at once may think of 
learning them as like learning vocabulary in a foreign language, and try to 
absorb a few each day. 

These notions are connected by an endless list of little laws, such as 
R[C] = ran R|C, that follow at once from the definitions. Such laws occupy 
page after weary page in the first volume of Whitehead and Russell’s monu- 
mental Principia Mathematica (1910). A few are usually singled out for special 
mention: 


(ReS)'=S1.R! (ReS)oT=Ro(S°T). 
R[CUD] = R[C]URID] RI[CND| CRC] N RID). 


There is also special vocabulary for special features a relation may or may not 
possess, shown in Table 12 (wherein defining conditions are supposed to hold 
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Table 12 Properties of relations 


Name Definition 
Reflexive aRa 

Irreflexive saRa 

Symmetric aRb Dd bRa 
Antisymmetric aRb \ bRa DD a=b 
Transitive aRb \ bRc D aRc 
Connected (reflexive case) aRb V bRa 
Connected (irreflexive case) aRb V a=bV bRa 


for all a, b, c in the field of the relation). Officially, a relation is a set of ordered 
pairs, so since there is no set of all pairs (x, y) such that x € y — the assumption 
that there is can without much difficulty be shown to imply the existence of a 
universal set — elementhood is not a relation in the official sense; neither is 
inclusion C. We can still call them relationships, and apply the terminology in 
the table. (There is a variant NBG of ZFC in which they are treated more 
formally as “classes,” collections assumed over and above sets.) Thus inclusion 
is reflexive, elementhood irreflexive. 


4.3 Functions, Orders, Equivalences 


For future reference, definitions will be collected now pertaining to three kinds 
of relation ubiquitous in mathematics. This material, admittedly a bit dry until 
we are ready to take up substantive examples, may be skimmed and referred 
back to as needed later. A function is a relation R such that for any a in dom R 
there is a unique b in ran R with aRb. Often one uses lowercase letters f, g for 
functions. The unique 5 with afb is called the value or output for argument or 
input a, and denoted f(a). If, additionally, for any 5 in the range there is exactly 
one a in the domain with f(a) = b, the function fis called injective. The notation 
f: A—B indicates that fis a function with dom f = A and ran f CB. If ran 
f = 8, then the function is called surjective with respect to B, while bijective 
means both injective and surjective, and a function that is in- or sur- or bijective 
is called an in- or sur- or bijection. (Older terminology was “one-to-one” and 
“onto” and “correspondence.’’) In certain contexts, it proves convenient to write 
the values of a function X with dom X =/ not as X(i) but as X;. With this 
notation, we write the range as {X;|i¢J/} and call it an indexed family with 
index set I. The DeMorgan and distributive laws of Table 9 in Section 4.2, 
among others, generalize to indexed families: 
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— {Xie T} = U{-X,ie I} — U{X ie T} = N{-XjJie Tf. 
XnN(U{¥ieT}) = ULXNY Jie XU(N{Y ie) = {Xu ilies}. 


A two-place function is simply a function whose arguments are ordered pairs, 
but we write f(a, b) rather than f((a,b)) for simplicity. Note that, officially, a 
function is a set of ordered pairs, so we cannot call intersection and union two- 
place functions. We call them operations, and apply the same terminology of 
“associative” and “commutative” and so on to them as to two-place functions. 

It is easily seen that identity on a set A, i = {(a,a)|a € A} is a function; also 
that the inverse f—! of a function fis a function if fis an injection. Also, the 
composition f °g for functions fand g is a function if ran f C dom g. A good 
exercise is to verify that the identity function is a bijection, that the inverse of a 
biection is a byection, and that the composition of two bijections is a bijection. 
Then if we define two sets to be equipollent or equinumerous if there is a 
bijection between them, as Cantor did, it follows that equipollence or equinu- 
merosity is a reflexive, symmetric, and transitive relationship. 

Notice that under the definitions used so far, beginning from what is the most 
natural definition of composition when working on the general theory of rela- 
tions, we get for functions that ( f°g)(a) = g(f(a)), where in the notation the 
order of fand g get switched. The more usual approach in mainstream mathem- 
atics, which rather seldom considers composition of relations other than func- 
tions, modifies the definition so as to get the result (g°f)(a) = g(f(a)). 

A partial order is a relation that is reflexive, antisymmetric, and transitive. 
Often we write < or a similar symbol for a partial order, and then use related 
notations in more or less obvious senses: 


x<ysziffx<y and y<z x>yiffy<x x<yiffx<y butxFy. 


A minimal element x of a set X is one such that for no yin Xisy<x.A 
minimum or least is one such that for all y in. X we have x <y. The terms maximal 
and maximum or greatest are used analogously. The minimal versus minimum 
distinction collapses, allowing both to be abbreviated min, for connected partial 
orders, called total or linear orders or simply orders. A chain in a partial order is 
a subset C of its field connected by <. A wellorder is one in which every 
nonempty subset of the field has a least element. A set is wellorderable if there 
exists some wellorder on it (in which case, there will also exist others). 
Sometimes, it is more convenient to start with the notion of a strict order <, a 
relation that is irreflexive, transitive, and connected, and think of < as defined in 
terms of <. 
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In modern mathematics, a structure A is a nonempty set 44 © 
equipped with extra apparatus, which in the only case of interest here will be 
simply a two-place relation R C A @ A. A function f: A — B is an isomorphism 
between structures A = (A, R) and B = (B,S) if it is a bijection and further 
preserves the relevant relation, meaning that if f(a) = b and f(c) = d, then 
aRc iff bSd. In the case of sets with orders A, B, since the sets A, B are 
simply the fields of the relation R, S, for ordinary purposes one hardly 
distinguishes between the structures and their relations, speaking of struc- 
tures as orders and of isomorphism between relations. B = (B, S) is a 
substructure of A = (A, R) iff B C A and S is the restriction RN (B® B) of 
R to B. When the set A is a family of subsets of some set J, so A C P(Z), we will 
write X € A when we really mean (A, R) where R is the restriction of inclusion to 
A, the set of pairs (X, Y) with (A, C) and Yc Aand X CY. (A, €) is understood 
similarly. For example, (P(/), C) is a partial order with minimum @, and its 
substructure (P* (I), C), where P* (I) = P(I) —{@}, the family of non- 
empty subsets of J, is a partial order in which every singleton {a} is minimal, 
but there is no minimum. If f: / J is a bijection, then there is an “induced” 
isomorphism F between (P(/), C), and (P(J), C) given by F(A) = f[A]. 
A good exercise is to check that the identity function is an isomorphism, that 
the inverse of an isomorphism is an isomorphism, and that the composition of 
two isomorphisms is an isomorphism. Then if we call structures isomorphic 
when there exists an isomorphism between them, it will follow that “isomorph- 
ism” in the sense of “being isomorphic” is a reflexive, symmetric, and transitive 
relationship. 

An equivalence relation is one that is reflexive, symmetric, and transi- 
tive. We often write E or some equals-sign-like symbol for an equivalence. 
The term is also applied to relationships, including equinumerosity ~ and 
isomorphism =. Given a function f with domain 4 the relation aEb that 
holds if f(a) =f(b) is an equivalence. So if X is a partition of /, then 
considering f(a) = the unique cell [a] of X to which a belongs, we see that 
belonging to the same cell is an equivalence on J. Inversely, if we 
start with an equivalence E on / and let [a] = {b: aEb}, the family X of 
all [a] is a partition of J. (To see that the [a] are nonempty and that 
UX =1, use the reflexivity of E to conclude a € [a]. Symmetry and transi- 
tivity can be used to show that for all a and b, if aEb then [a] = [b], while if 
aaEb then [a] M |b] = ©.) Equivalence and partition are twins. The cell [a] 
is traditionally called the “equivalence class” of a. All the foregoing notions 
can be further exemplified once we have the traditional number systems 
available. 


Downloaded from https://www.cambridge.org/core. IP address: 154.28.188.203, on 03 Feb 2022 at 10:51:02, subject to the Cambridge Core terms 
of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781 108981828 


22 Philosophy and Logic 


5 Number Systems Within Set Theory 


A history of modern mathematics in about 300 words: It began building on a 
heritage summed up in two writers. Euclid’s Elements presented the ideal of 
rigor and its partial realization in geometry. Al-Khwarizmi (whose name gives 
us our word algorithm) in his al-Jabr w’al-Mugqabala (whose title gives us our 
word algebra) transmitted solutions of linear and quadratic equations. The first 
modern contributions began with the solution to cubic equations, where already 
we see a key modern feature, solving problems in one mathematical realm by 
bringing in another: Real roots of cubics are found by introducing “imagin- 
aries.” Then followed the development of an efficient algebraic notation, and 
the coordinate methods of analytic geometry, leading to algebraic solutions to 
geometric problems — at the cost of compromising the rigor of pure geometry. 
The analytic approach to ancient problems of tangency and quadrature led to the 
calculus and its notions of derivative and integral, which take us back and forth 
between a quantity varying with time and its rate of change, and provide 
apparatus for physics — at the cost of difficulties over infinities. The introduction 
of ever-new structures (non-Euclidean geometries, noncommutative algebras) 
continued apace in the nineteenth century, when rigor began to be firmed up. 
The interaction of different branches of mathematics meant a need for not just 
rigorous treatments of various branches separately, but a unified system, and 
one flexible enough to accommodate an endlessly growing array of novelties. 
The kind of framework needed was eventually provided by ZFC, and the 
reconstruction of mathematics on a set-theoretic basis became the theme of a 
French group writing a vast encyclopedia under the pseudonym Bourbaki, 
beginning with “his” (1939). The most important devices used can be seen at 
work already in the set-theoretic construction of the higher number systems 
(integral, rational, real, complex) out of the natural numbers, themselves 
explained set theoretically. This reconstruction is all that can be discussed 
here, and even it only in outline. 


5.1 Real to Complex 


Reconstruction of the traditional number systems proceeded in the reverse of the 
historical order of their introduction, in five steps, explaining a higher system in 
terms of a lower one taken for granted: (i) the complexes C in terms of the reals, 
(i1) the reals R in terms of the rationals, (111) the rationals Q in terms of the 
integers, (iv) the integers Z in terms of the naturals, and (v) the naturals N in terms 
of set theory. Here (i), (ii), and (v) will be sketched, (i11) and (iv), being much like 
(1), and nowadays covered alongside it in textbooks of abstract algebra, as (ii) is in 
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textbooks of analysis. For more detail, this material was first made available in 
one place to undergraduates in Landau (1930), which remains highly readable. 

At no stage was a new rigorous definition of a number system put forward as 
an account of what the numbers in question had really been all along, faithful to 
the ideas — in fact, often hazy — of the first to study it. As regards step (v) this fact 
is emphasized in the philosophical classic Benacerraf (1965), but it holds 
generally. The search was for a surrogate system with all the properties trad- 
itionally assumed that are important for mathematics. Such a project presup- 
poses consensus about what the mathematically important properties are. They 
are perhaps most easily identified in the case of the complex numbers. Each is a 
sum a + biofareal a and areal multiple 5 of the “imaginary” unit 7, and they are 
added and multiplied according to the following rules: 


a+ bi)+(c+di) =(at+ce)+(b+d)i 
(a+ bi) - (c+ di) = (a-c—b-d)+(a-d+b-c)i. 


That is all. In the case of the reals and naturals, the needed identification of 
mathematically important properties is due mainly to Dedekind: for the reals in 
Stetigkeit und irrationale Zahlen, for the naturals in Was sind und was sollen die 
Zahlen?, available together in English as Dedekind (1901). 

In the complex or ‘imaginary’ case, around 1800 several workers independ- 
ently came up with a geometric interpretation of complex numbers as points in 
the plane, still taught in schools. The great C. F. Gauss remarked that if instead 
of positive, negative, and imaginary one had spoken of forwards, backwards, 
and sideways, there would never have even seemed to be any mystery. In 
coordinate geometry, a plane point is represented by a pair of reals, and so 
one can simply identify a + bi with (a, b) and stipulate the desired arithmetic 
rules: 


(a,b) + (c,d) = (a+c,b+d) (a,b)- (c,d) =(a-c-—b-d,a-d+b-c). 


It is a tedious but routine exercise to derive the usual commutative, associa- 
tive, and distributive laws for the complexes from these definitions and the same 
laws for the reals. 

Generalization motivates rigorization: The introduction of unfamiliar struc- 
tures, where the reliability becomes doubtful of intuitions developed from work 
with more familiar ones, was one reason for closer attention to rigor. Inversely, 
rigorization often opened up the prospect of innovations. If pairs can be added 
and multiplied, what about triples or quadruples, or whatever? W. R. Hamilton 
found that there is no reasonable multiplication rule for triples, but that there is 
one for quadruples, and thus he arrived at the quaternions expounded in 
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Hamilton (1853). Once a construction has given us the mathematically import- 
ant properties, mathematicians might as well forget about it, except that they 
may want to come back to it when seeking to do something analogous 
elsewhere. 


5.2 Rational to Real 


Rigorous Greek mathematics recognized no zero and no negative numbers, and 
not even positive real numbers, but only ratios of geometric magnitudes such as 
line segments. Neither did it recognize even positive rational numbers, but only 
ratios of positive natural numbers, with ratios themselves scarcely considered 
independently of questions of proportionality or equality of ratios, the first is to 
the second as the third is to the fourth. Taking proportionality for natural 
numbers to be understood, proportionality for line segments can be made 
sense of whenever a ratio of segments A:B can be equated with a ratio of natural 
numbers m:n. This can be done when dividing B into n equal pieces and laying 
off m of them along A, they exactly fill it. If they fall short or go beyond, A:B is 
greater or less, as the case may be, than m:n. What is often anachronistically 
described as the Greek discovery that V2 is irrational was actually the discovery 
that the ratio of diagonal to side in a square is not the same as any ratio of natural 
numbers. How, then, can proportionality for segments in general be defined? 
The solution of Eudoxos, found in Euclid, Book V, in effect declares A : B equal 
to C : Dif for every m and n, if either of the two segmentratios is greater than m: 
n, then so is the other. For us, for whom ratios of natural numbers are rational 
numbers and ratios of line segments are real numbers, this says that a real 
number is completely determined by the set of rational numbers less than it. 
Early modern mathematics had an account of what (signed) real numbers are: 
ratios of lengths of (directed) line segments. Certain Euclidean straightedge and 
compass constructions can be interpreted as adding and multiplying ratios, 
making them numbers in the sense of things one can add and multiply. And 
the usual commutative, associative, and distributive laws can be deduced well- 
known geometric theorems. This is the standpoint explicit in Isaac Newton’s 
Universal Arithmetic and implicit in René Descartes’ Geometry, and it seems to 
go back to at least Omar Khayyam. By the nineteenth century, especially after 
the advent of non-Euclidean geometries, it came to seem desirable to provide a 
new, nongeometric, purely arithmetical-algebraic understanding. This is what is 
done by Dedekind (and independently in a different way by Cantor). Dedekind 
removed the geometric scaffolding and, in effect, simply identified a real 
number with the set of rational numbers less than it. Needless to say, as a 
definition, “a real number is the set of rational numbers less than some real 
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number” is hopelessly circular. The trick must be to characterize the relevant 
sets of rational numbers without presupposing reals. The Dedekind identifica- 
tion is with a cut in the rationals, a set A with three features: (i) A is neither G 
nor the whole of Q; (ii) every element of A is less than every nonelement; (iii) A 
has no largest element. If these are taken as the real numbers, we can define the 
order relation < on them simply as the inclusion relation C. It is necessary to 
define also the arithmetic operations and show that the usual laws apply, which 
Dedekind did, giving as he claimed the first rigorous proof that J2-V3- V6. 

For Dedekind, the crucial property of the real numbers going beyond the 
basic laws of algebra that hold already for the rationals, was continuity, equiva- 
lent to what is known as the /east upper bound (LUB) principle. Here an upper 
bound for a set A is a y with x < y for every x in A. and a Jeast upper bound 
(LUB) is an upper bound y such that y < z for any other upper bound z. The LUB 
principle says that any nonempty set with an upper bound has a /east one, and 
this is indeed just what is needed for the “intermediate value theorem” and other 
basic results of calculus. Dedekind’s proof is simplicity itself: Given a family of 
real numbers, which is to say, of cuts, with an upper bound, just take their union 
to get a least upper bound. Dedekind remarks that, in teaching introductory 
calculus, to save time it is better not to go into such things, but just rely on 
geometric intuition. But his cut construction found its way into introductory 
college textbooks, beginning with the second edition of G. H. Hardy’s Pure 
Mathematics (1914). 

It should be noted that on this construction the rational numbers Q are not 
literally included in the real numbers R, but only an isomorphic copy, with the 
rational q replaced by the cut consisting of all rationals p < g. Reconstructions 
tend to distinguish natural number 2, the integer +2, the rational +2/1, and the 
real 2.000000. Some symbolic computation programs do the same, but ordinary 
mathematical usage does not. But ordinary mathematical usage is admitted to 
include many an “abuse of language.” 


5.3 Peano Postulates 


What are (for debatable historical reasons) called the Peano postulates for the 
natural numbers, zero, and successor, denoted N, 0, and S, read as follows: 
(P1) OE€EN 

(P2) S:N-+N 

(P3) VxEN (S(x) 40) 

(P4) VWxeEN WEN (S(x) =SQ) Dx =y) 

(P5) VX CN ((OEX A VeEN(xEX D S(x) €X)) D VXEN (x EX)). 
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Here (P5) is called the principle of mathematical induction, and may be 
restated as follows. Call X inductive if 0 is in X and X is “closed under” S, 
meaning S(x) is in Y whenever x is in XY. Then any inductive set contains all 
natural numbers. Applied to the set {x € N|®(x)}, we can prove a condition 
holds ®(x) for all natural numbers by proving (i) the zero or base case, that it 
holds for 0; and (ii) the successor or inductive step, that it holds for S(x) 
assuming the induction hypothesis that it holds for x. This method is ubiquitous 
in mathematics. 

Before we can do number theory we need to have the operations of addition, 
multiplication, and exponentiation, as well as the relation of order. Addition can 
be characterized by the following recursion equations: 


(1) x+0=x 
(2) x+ S(v) = S(x+y). 


Laying down such equations is called definition by recursion, but since + 


occurs on both sides of (2) we do not have here a definition comparable to those 
exhibited in tables so far, which make it possible to replace the defined symbol 
anywhere by an expression not involving it. But Dedekind showed how to make 
an honest definition of recursive specifications, in several stages. First, fix x, and 
call a function good for x if we have the following, for all y: 


(3) 0€dom f A f(0) =x 
(4) S(v) €dom f D (vedom fA f(S(¥)) = S(F()))- 


A good fbehaves like the function taking y to x + y as far as it goes. One can 
show by induction that if g and / are good and y is in the domain of both, then 
g(v) = h(y). The zero case is immediate, since (3) specifies the value for 0 . The 
successor step is immediate given the induction hypothesis that g(v) = h(y), 
since (4) specifies the value for S(v) given the value for y. 

Second, one can show by induction that for every y there is a good f with y in 
its domain. For the zero case take {(0,x)}, a function with domain {0} satisfy- 
ing (3), which also satisfies (4) “vacuously” since 0 is not a successor by (P3). 
For the successor step, given the induction hypothesis that there is a good f with 
y in its domain, if S(’) is already in the domain of f we are done. Otherwise, 
consider g = f U{(S(v), S(f(y))}, a function since fis one and S(y) is not in its 
domain, which satisfies (3) since fdoes, and satisfies (4), since fdoes and S(y) is 
not the successor of anything but y by (P4). Thus the conditions “for some good 
f with y in its domain f(y) =z” and “for every good f with y in its domain 
f(y) =2’ are equivalent, and for each y there is a unique z such that it holds. 
Replacement allows us to form the set of all pairs (y, z) satisfying the condition 
to give a function f, with (0) = x and f,(S(y)) = S(f-(v)). Do this for all x, 
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and replacement then allows us to form the set { f/x € N}, and union allows us 
to form the desired function +. 

We can define x < y to holdifx + z = y for some z. We can define multiplication 
and exponentiation by recursion equations analogous to (1) and (2). Given these 
definitions, we can establish the basic laws of arithmetic by induction. For instance, 
for the associative law for addition the zero case and successor step look like this: 


x+(y+0)=x+y=(x+y)+0 


xt +8) =¥+80-+2) = 80+ 0-42) 
= S((x+y) +z) = (x+y) 4+ S(z). 


The other laws are not hard if taken in the right order (which is not always 
obvious). But we are not yet done. 

To accommodate applications to counting finite sets we need to define what it 
is for set X to have a natural number x as the number #X of its elements. We can 
then define X to be finite if we have #X = x for some x in N. The definition that 
works has #X = x if there is a bijection between X and the set of natural 
numbers < x. We need also to connect the recursive definitions of the arithmetic 
operations with the combinatorial characterizations of those same operations, 
familiar from school. In the case of addition, for the combinatorial character- 
ization we consider the union of two disjoint sets, or equivalently but more 
artificially, of disjoint copies of two sets. In the case of multiplication, it is the 
Cartesian product. In the case of exponentiation — where up-arrow notation will 
be typographically more convenient that superscript notation — it is the set of 
functions from one set to another. Table 13 shows what we need. To prove the 
needed equalities using induction we start by showing that if #X = x and y¢.X, 
then #(X U {y}) = S(x) and work from there. 

It “only” remains to obtain from the axioms of set theory an N and a 0 and an 
S satisfying the Peano postulates. All we need is a set _X with a function from XY 
to X that is an injection but not a surjection. For then we can pick any element 
not in its range and call it zero, while calling the function itself successor, and 


Table 13 Combinatorial Characterizations 


Operation Characterization Definition of Set Operation 


#X + #Y =#(X GY) = #({(0, x)|xeEX}UL{(I, y)| ve Y}) 
#X -#Y =#(X @Y) = #{(x, y)|xeXAVEY} 
HXTHY = =#(X1PY) =#0ff:Y>X} 
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we will have (P3) and (P4). We can then define a subset Yof X to be inductive if it 
contains 0 and is closed under S, and trivially X itself will be inductive, and 
hence the family of inductive sets nonempty. We may then take N to be the 
intersection of this family, which can easily be seen to be itself inductive, giving 
(P1) and (P2). As for (P5), it will essentially have been made true by the mere 
definition of N. Contrary to a widespread opinion, classically expressed in 
Poincaré (1905/1983), intuition is not needed, but only logic, to obtain math- 
ematical induction — given set theory. For set theory does, with its axiom of 
infinity, supply us with a set and a nonsurjective injection of the kind required, 
one for which 0 = © and S(x) = {x}. 

This formulation is Zermelo’s. There is an equivalent alternative due to 
John von Neumann (1923/1967), with 0 = @ but S(x) defined not as {x} 
but as x U{x}, abbreviated x’. That x’ = )’ implies x = y is less obvious than 
that {x} = {vy} implies x = y, but not too hard to prove using foundation. 
(For x’ has an epsilon-maximal element, one of which all its other elements 
are elements, namely x, and it cannot have any other such y # x, since that 
would give x €y €x, contrary to foundation. If x’ = y’, their epsilon-maximal 
elements, x and y, must be the same.) Where the Zermelodic definition makes a 
natural number the singleton of its immediate predecessor, the Neumannian 
makes it the set of all its predecessors: | = {0}, 2 = {0,1}, 3 = {0, 1, 2}, 
and this allows us to give the simple definition #X =x if there is a 
bijection between X and x. It also has the advantage of generalizing to 
the transfinite, as will be seen later. Other mathematicians never think 
about whether 2 = {1} or 2 = {0,1}, but set theorists adhere to the latter. 
There are also variations on these constructions in which infinity is not needed 
for the theory of natural, integral, or rational (as contrasted with real and 
complex) numbers. 

Set theory provides a framework for the rigorous development of all 
mathematics. Each branch, group theory or field theory or whatever, is 
concerned with some special kind of set-theoretic structure, groups or 
fields or whatever, and the “axioms” of the theory are merely the definition 
of the class of structures in question. Often ZFC is described as a “foun- 
dation” for mathematics, but such a description is questionable. To accom- 
modate all mathematics, ZFC includes some assumptions more open to 
doubt than what would be needed just to accommodate arithmetic. Hence, 
incorporating arithmetic into set theory is not placing it on a firmer 
foundation. Rather, it is placing it in a context where it can interact with 
other branches of mathematics, with a common standard of proof. That is 
accomplishment enough. 
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6 Infinities 


With the number systems available for examples, we are where Cantor was 
when he took off into the transfinite. Intuitively, their equivalence type is what 
items equivalent in some way thereby have in common, as lines in the plane that 
are parallel have in common their direction. Cantor understood the power or 
cardinal \Al of a set A to be its equivalence type under equipollence or 
equinumerosity ~, and the order type |A| of an ordered set A to be its equiva- 
lence type under isomorphism &. He called the order types of wellorders 
ordinals and the cardinals of wellorderable sets alephs. He never identified 
such items with sets, but of his account of what they are (mental items created 
by acts of selective inattention), the less said the better. A good deal can be 
established while remaining silent about such issues. 


6.1 Cardinals 


Cantor defined |All = IBI if there is a bijection f: A + B. He defined sum and 
product and power, + and - and f as in Table 13 of §5.3. In particular, with 
No = INI, 25 is the cardinal of all zero-one sequences, or equivalently of all sets 
of natural numbers. He defined IAI<IBI if there is an injection f: A— B, 
equivalent to there being a C with IAl + ICI = IBl (Consider C = B — ran f.) 
Many of the laws proved by induction for natural numbers hold for cardinals 
generally, by proofs directly from the definitions. For instance, the associative 
and commutative laws for addition follow from the corresponding laws for 
union, and ultimately disjunction. Other laws fail badly, notably cancellation: 


X+Z=y+zZDx=y Z>O0Ax-z=y:zDx=y. 
Counterexamples include these: 
No =No tN =No-No = 2K = 2H + 2= 2F- 2G. 


For products, the result about Xo follow from the codability of a pair (m, n) 
of naturals by the single natural 2” - (2n + 1), and the result about 2%o then 
follows using general laws of exponents: a . a = aa — 2. Failure of 
cancellation means that there is no subtraction or division for Cantor’s trans- 
finites, which have nothing to do with the supposed infinitesimals of prerigor- 
ous calculus. For antisymmetry of <, even without cancellation we still get 
the law. 


Cantor—Bernstein Theorem If ||A||<||B|| and ||B||<||A|| then ||4]| = ||B]]. 
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Textbook proofs generally presuppose that the apparatus of natural numbers 
and definition by recursion has already been set up in a set-theoretic context 
before this topic is broached. This is partly for historical reasons: the earliest 
proof (Bernstein’s) was found when set theory was still being pursued taking the 
traditional numbers systems for granted, without thought of redeveloping arith- 
metic from set-theoretic axioms. And it is partly for pedagogical reasons: that 
proof has the advantage of lending itself to illustration by a picture. But there are 
also advantages to having a proof more from first principles, such as could be 
used to develop the general theory of cardinals before the finite cardinals or 
natural numbers are singled out for special attention. Let me outline such a proof 
(due to Zermelo, building on Dedekind) as a series of exercises for the reader, 
since it is not very often presented elsewhere: 


(1) Letg: AB andh: B —A be injections. 

Let B* = h[B] CA and C = Alg[A]] C B* and f(geh)': A > C. 

Show that fis a bijection. 

Define G: P(A) + P(A) by G(X) = (A — Bx) Uf |X]. 

Show that G(Y) C G(Y) whenever X CY. 

Show that {¥ CA|G(X) CX} ¥ @D, implying that Z =N{X CA|G(X) 

CX} exists. 

(4) Show that G(Z) C Z. Then show that G(Z) = Z. 

(5) Define k: A — B* by k(x) = f(x) if xe Z and = x otherwise. Show that k 
is a bijection. 

(6) Let g* = keh"'. Show that g*: A > B is a bijection. 


(2 


wa 


(3 


wa 


wa 


This result is used in computations. For 2%) = IP(N)I_ and ¢ = R it is easier 
to show IRI <IP(N)I<R than to show 2%o = ¢ directly. 

Cantor’s diagonal argument, used to prove ¢ uncountable, generalizes. Given 
a set K of cardinal « and a family of subsets of K indexed by a set of cardinal , 
say K itself, there is a subset of K left out of the family, namely, if (X,: k € K} is 
the family, {k|k¢X,} is left out. This shows that 2“ > « for all «, giving 
indefinitely many larger and larger cardinals. 


6.2 Order Types 


By order, understand in the present discussion strict order. The reverse of an order 
< is the order >. The sum and product of orders A = (A, R) and another order 
B = (B,S) are the orders on A @ B and A ® B, respectively, given as follows. 


Gi,x)< (Uy) if @G=f=O0AxRy) VG =0Aj = 1) VG =/s = 1 AxSy) 
(a,b) < (c,d) if bSdV (b= dAaRc) 
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Table 14 Counterexamples 


Order Type Looks Like 

l+o (0, 0), (1, 0), (1,1), (1,2), C1, 3), .-- 

o+ 1 (0, 0), (0, 1), (0, 2), (0, 3), ... (1,0) 

2-0 (0, 0), (1, 0), (0, 1), C1, 1), (0, 2), (1, 2), (0, 3), C1, 3), ... 


@-2 (0, 0), (1, 0), (2, 0), (3, 0), ... (0, 1), 7, 1), 2, 1), GB, D,.-. 


Thus the sum puts a copy of A before a copy of B, while the product orders 
pairs in reverse dictionary order, first by their second components, and if these 
are the same, then by their first components. The reverse px of the order type p 
of A is the order type of the reverse of A, and similarly for sums and products. It 
can be checked that these notions are “well-defined”: If two orders are iso- 
morphic, having the same type, the same is true of their reverses, and similarly 
for sums and products. If we give the order types of the usual orders on naturals 
N and integers Z the names w and a, then 7 = * + o. 

Some of the usual laws hold, by extensions of the proofs used in the case of 
cardinals. This includes the associative laws. Other laws that held for cardinals 
fail for order types, including the commutative laws. Examples in Table 14 
illustrate the point, showing 1 + © = 2-@=@, while + | is different, hav- 
ing a last element with no immediate predecessor, and @ - 2 is different, having 
an element with no immediate predecessor, but no last element. 

Characterizations of the order types @ and 4 of N and R are at least implicit in 
Dedekind. The characterization of the order type n of Q is a famous theorem of 
Cantor. To state these results, we need a few more notions pertaining to orders 
beyond those of wellorder and continuous order. An order is dense if whenever 
x <y there is a z with x < z < y, and a subset Z of A is dense in the order if 
whenever x < y there isaz€Z with x < z < y. Thus Qis dense in R. An order 
with a countable dense subset is called separable. Thus R is separable. By 
contrast, an order is discrete if every element x but the least (if any) has an 
immediate predecessor, a y < x with no z between, and every element x but the 
last (if any) has an immediate successor, ay > x with no z between. 

The characterizing properties of the traditional orders are as in Table 15. Only 
Cantor’s famous “back and forth” argument for n will be given here. (The same 
argument, going only “forth” and not “back,” shows that every countable order 
is isomorphic to a suborder of Q.) 

So let A and B both have the properties indicated for 7 in the table. We may 
take one of them to be the rationals. Since A and B are both countable, we can fix 
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Table 15 Characterizations of Order Types 


Type Properties 

0) wellordered, discrete, no greatest element 

n countable, dense, no least or greatest element 

A continuous, separable, no least or greatest element 


an enumeration of each, which need have nothing to do with the order relations 
<4 and <g. Call fa partial isomorphism if it is a function from a finite subset 
of A to a finite subset of B that preserves order. 


Lemma If /is a partial isomorphism, for any a in A—dom//f there is a b in B—ran 
such that g = fU {(a, b)} is a partial isomorphism. 


Informally, “we can add any element we like to the domain of a partial isomorph- 
ism.” For the proof, to preserve order, if a is < all elements of dom fin A, we need 
b to be < all elements of ran fin B. If a is between two elements a*, a** of dom f 
in A, we need b to be between b* = f(a*) and b** = f(a**) in B. Ifa is > all 
elements of dom fin A, we need b to be > all elements of ran fin B. In each of the 
three cases a suitable b is available because B has no least element, or because B is 
dense, or because B has no greatest element. For definiteness, we may take for b 
whatever suitable element comes earliest in the fixed enumeration of B. 
Essentially the same proof shows we can add any element we like to the range 
of a partial isomorphism. To prove Cantor’s characterization theorem we go back 
and forth in steps, at even steps adding to the domain, at odd steps to the range, 
getting a sequence of partial isomorphisms with larger and larger domain and 
range until A and B have been exhausted, when putting everything together we get 
an isomorphism between A and B. 

A problem of Mikhail Suslin concerning 4 also deserves mention. 
Separability, the existence of a countable set containing at least one element 
from every open interval, implies the nonexistence of an uncountable family of 
nonoverlapping open intervals. Can the latter replace separability in the char- 
acterization? Suslin s hypothesis (SH) is that it can. Information about the status 
of SH will be provided later. 


6.3 Ordinals 


We now take up von Neumann’s approach to ordinals and alephs. A set x is 
transitive if every element of an element is an element, Ux C x, or equivalently, 
every element is a subset, x C P(x). A (von Neumann) ordinal is a transitive set 
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on which € is a strict wellorder. Then isomorphism with an ordinal « means 
isomorphism with this order (a, € ). The development of the theory of ordinals 
is facilitated by foundation (although feasible without it), using the following. 


Ordinal Criterion If x be transitive and suppose € is connected on x. Then x is 
an ordinal. 


Proof: Foundation implies irreflexivity of € . We then get transitivity of € onx 
because for elements of x if w€v€u, foundation precludes having u = w or 
u € w, and connectedness leaves w € u the only alternative. So € is a strict order 
on x, which foundation, with the existence of epsilon-minimal elements, then 
says is a wellorder. 


Ordinal Transitivity Let x be an ordinal and wu an element of x. Then uw is an 
ordinal. 


Proof: The argument just given shows that uv is transitive, and connectedness of 
€ is inherited from x. 


Ordinal Connectedness Let x and y be ordinals. Then either xEy 
orx=yOryve x. 

This is a more substantial result. Here is the proof in outline as exercises for 
the reader: 


(1) Show that if y—x 4 @, and z is an element thereof, then xN y Cz. 

(2) Show that ify —x # @, andz is an epsilon-minimal element thereof, then 
xy =z. 

(3) Show that we cannot have both y—x 4 @,andx—y#F @. 

(4) Show that if x A y, either yCx, implying xNy=y, or x Cy, implying 
xy =x. 

(5) Show that either x€ y or x = y or yEx. 


Given ordinal transitivity and connectedness, the argument for the criterion shows 
that € is a strict wellorder relationship on ordinals, and accordingly with ordinals 
we write € and < interchangeably, and each ordinal becomes the set of all ordinals 
less than it. By the foregoing, if contrary to fact the ordinals formed a set, it would be 
an ordinal, and the largest ordinal, a result known as the Burali—Forti paradox. (A 
“paradox” because example (2) below shows there can be no largest ordinal.) 


Ordinal Examples 

The zero, 0 = @, is an ordinal, the least. 

The successor, x’ = x U {x}, of an ordinal x is an ordinal, the least > x. 

The supremum sup ¥ = U_X, ofaset X of ordinals is an ordinal, the least > all 
elements of X. 
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Proofs are left as exercises, unpacking definitions. As for notation, 1 =0’, 2 = 1’, 
...and@ = sup{0, 1, 2, . . .}, the least /imit ordinal, or ordinal neither zero nora 
successor. We reserve lowercase Greek a, 8B, y for ordinals. A function whose 
domain is an ordinal o is called an a-sequence, and if its value for B < a is denoted 
xg, the sequence may be denoted (xp|B <a), although, in the case of a finite 
sequence, we will not distinguish those of length two from ordered pairs, and will 
use the usual notation for pairs, triples, and so on; similarly with an o-sequence such 
the zero-one sequences in the diagonal argument. 

Ordinary finite induction for natural numbers has an analogue, transfinite 
induction for ordinals. 


Induction If for, alla, ®(a) holds provided ®(f) holds for all B < a, then O(a) 
holds for all a. 


Proof: The proviso says there is no /east a for which ®(a) fails, so by the 
wellorder property than can be none at all. 

In proofs by induction, the proof of the proviso often breaks up into three 
cases according as @ is zero, a successor, or a limit. As simple application we 
have the following. 


Lemma Let § be an order-preserving operation on ordinals. Then for all a we 
have a<§a. 


Proof: The zero case is trivial. In the successor case, having a < §a, since a < a’ 
we must have §a < §a’, whence a < §a’, whence a’ <§«a because a’ is the 
least ordinal > a. The limit case is left as an exercise. 


Corollary No ordinal is isomorphic to any smaller ordinal. 


For if f: a— 8 were an isomorphism with B < a, we would have f(B) < B, 
whereas by the lemma we have f < f(B). 

Along with induction we have ordinal or transfinite recursion. We can 
specify a + B as follows: 


a if B= 0 
(a+7)’ if p=" 
sup {a+ yly < 8) if B is a limit. 


This can be turned into an honest definition using induction as was done for 
natural numbers in §5.3 (defining a good function, and so on). Multiplication 
and exponentiation can be similarly introduced. And some ordinary laws can be 
proved by induction, usually with the zero and successor cases being just as for 
natural numbers. With other ordinary laws, the limit case cannot be pushed 
through, and we have a refutation by counterexample, for instance: 
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1 +@= sup {l4+nln<o} =o<o0!' =o+ I. 


The inductive definitions of addition and multiplication can be connected with 
Cantor’s notions of addition and multiplication for order types in general, in much 
the same way the inductive definitions were related to combinatorial characteriza- 
tions for the natural numbers. One more important connection remains: 


Comparison Lemma Every strict wellorder A = (A, <,) is isomorphic to 
some ordinal. 


Uniqueness of the ordinal, which may be called the order type of the wellorder, 
follows from the corollary. For the proof, suppose A is a counterexample. B C A is 
an initial segment of A ifx € B whenever x<4y € B. We define an order-preserv- 
ing operation f from ordinals to an initial segment B, thus: If f(f) has been 
defined for B < a, its range cannot be all of A, else we would have an isomorph- 
ism between a and A; so let f(a) be the <4 least element of A that is >4 f(B) for 
any B < a. We get an injective operation from all ordinals into A, hence a bijective 
operation from all ordinals onto some B C A. Replacement guarantees the existence 
of {f~'(a)|a€ B}, but this is the set of all ordinals, which cannot exist! The 
Burali—Forti paradox, originally a misguided objection to Cantor, becomes a 
useful lemma. Here is another. 


Hartogs’ Lemma For every set X there is an ordinal a with no surjection 
f:X-4. 


For the proof, any surjection f: X — a gives rise to an equivalence relation 
F(x) =f (y) on Xand hence a partition of X, as well as a strict wellorder R on the 
cells of the partition, given by [x]R[y] iff f(x) < f(y); and the order type of this 
partition is a. Every strict wellorder on the cells of a partition of X belongs to 
P (P(X) ® P (X)), and separation lets us form the set Y of all such R, and 
then replacement lets us form the set Z of all order types of elements of Y. If B is 
the least ordinal greater than all those in Z, called the Hartogs’ number of X, 
then there can be no surjection /: X +f. Note that neither can there be an 
injection g: B + X. The importance of his lemma from Hartogs (1915) seems to 
have been only rather belated recognized. 


6.4 Alephs 


While it can be proved by mathematical induction that no finite ordinal is 
equinumerous with any smaller one, and that there is ‘up to isomorphism’ 
only one way to order a finite set, already there are many non-isomorphic 
ways of ordering the set of positive integers, as exhibited in Table 16, which 
should be compared with Table 14 of §6.2. 
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Table 16 Non-Isomorphic Wellorders 


Order Type Wellorder of Positive Integers 

@ Wy2, 3s-1es 

o+1 2,3,4,...1 

@:2 153,352 25-4, 65 20 

o 1,3,5,...2,6, 10,...4, 12, 20,... 


With ordinals, if there is a surjection f : a — , there is an injection g: Ba 
sending y < B to the least x < a with f(x) = y. Conversely, given an injection g 
we get a surjection f sending g(y) to y and anything not in ran g to 0. An initial 
ordinal is one for with no surjection from a smaller ordinal onto it, or equiva- 
lently no injection from it into a smaller ordinal. We denote the Hartogs’ number 
of an ordinal a by a*. We can define by recursion an indexing of all infinite 
initial ordinals by Wp = @, ®g41 = Wg', and Mg = sup {wp| < a} at limits. 
An aleph is the cardinal of a wellorderable set. Alephs may be identified with 
initial ordinals: %, = @,. But with aleph notation + and - denote cardinal 
operations, with omega notation, ordinal operations. The most important fact 
about the arithmetic of alephs is that X, = Ny + Xq = Na- Na. This is proved by 
induction. Considering here only the result for multiplication, the zero case we 
know. In the successor case, suppose Ng = No: Na and B = a’ so Xp = Nat. To 
show Ng = Ng - Ne it suffices to show that we can wellorder wg ® ws, in order 
type @g, which means that the number of pairs < any given one is < Ng. For this 
it will be enough that the predecessors any pair (y, 5) all come from €®€¢ for 
some C < @,, since the number of ordinals < C is <X, and the number of 
suitable pairs then <N, +N, = Ny. The order that works puts (y, 5) < (u,v) 
if one of the following holds. 


max (y, 6) < max (,1,v) 
max (y, 6) = max (u,v) andy < p. 
max (y, 6) = max (u,v) and y = p and 6 < v. 


The order type & of the predecessors of (y, 5) in this order may be considered 
single ordinal code for the ordinal pair. The limit case is left as an exercise. 
Sierpinski (1958) is a compendium of further results on cardinal and ordinal 
arithmetic. 

Let us tie up a loose end. For any x its transitive closure x} is the union of the 
sets f(n) defined inductively by f(0) = {x} and f(S(n)) = U f(x). It contains 
x, the elements of x, the elements of elements, and so on, and is a transitive set, 


Downloaded from https://www.cambridge.org/core. IP address: 154.28.188.203, on 03 Feb 2022 at 10:51:02, subject to the Cambridge Core terms 
of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781 108981828 


Set Theory 37 


and one included in any transitive set containing x. Ordinal induction has a 
generalization, epsilon induction: 


Induction If for, all x, ®(x) holds provided ®(y) holds for all y € x, then for all 
x, B(x) holds. 


For suppose (x) fails. Then {y € xt|-®(y)} is non-empty and by foundation 
has an epsilon minimal element y. Since x7 is transitive, xt contains all z € y, and 
by minimality, ®(z) holds for all such z, while failing for y, contrary to 
hypothesis. 

We can define by ordinal recursion V(0) = @, V(B+ 1) = P(V(B)), 
and V(a) = U{V(B)|B < a}. It can be proved by ordinal induction that 
V(B) C V(a) when B <a, and all V(q) are transitive and indeed supertransitive, 
meaning y€ V(a) whenever yCx€ V(qa). It can then be proved by epsilon 
induction that for every x there is an a with x € V(a). For if for each y € x there is 
a B and therefore a /east 8 with y € V(B), replacement implies the existence of 
the set of all such least B for y € x. If a is their sup, then x C V(a) and x € V(a’). 
These V(a) should be recognizable as the boxes of the cumulative hierarchy in 
Table 4 of §2.2. The least a with x € V(a’) is called the rank rk(x) of x. The 
intuitive ‘justification’ of the axioms of set theory in §2.2. in effect shows the 
following. 


rk ({a, b)} = max(rk(a), rk(b)) + 1 
rk( Ux) <rk(x) 


tk(P(x)) = rk(x) +1 


rk(@) = @. 


If E is an equivalence relationship, we can define the truncated equiva- 
lence class <x> of x to be the set of y with xEy of minimum possible 
rank, so that for any z, if xEz then rk(z)<rk(y). (It will be a subset of 
V(rk(x)'), existing by separation.) These truncated equivalence classes 
have the one mathematically important property of equivalence types, 
namely, that xEy if <x>> = <y>. And so we can take as set-theoretic 
surrogates for order types of nonwellorders and cardinals of nonwellorderable 
sets the truncated equivalence classes with respect to isomorphism and equinu- 
merosity. This approach is often called “Scott’s trick” after Dana Scott, who 
originated it. (The advantage of the von Neumann identifications in the case of 
wellorders and wellorderable sets is that they make the order type to be a 
specific wellorder of that type, and the cardinal to be a particular set equinumer- 
ous with the given one.) 
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7 The Axiom of Choice 


The axiom of choice (AC), of which no use has been made thus far, has both 
inconspicuous and conspicuous applications, sometimes in providing counter- 
examples to conjectures, sometimes in proving positive results, including 
Zermelo’s premier application to proving that R is wellorderable. 


7.1 Weak and Full Choice 


A countable union of sets of some kind means simply the union of a countable 
family of sets each of that kind. Two fundamental theorems of Cantorian theory 
read as follows. 


Theorem A A countable union of countable sets is countable. 
Theorem B If 4 is infinite, then |All > Xo. 


To prove (A) one might argue thus: Let the family be {4,|n EN}, 
and let A, = {Amm|m€N}. Then the union is the set of dmm for (m, 1) 
€ N@N, and we can use the coding of pairs of naturals by single naturals to 
get an enumeration. 

To prove (B), one might argue thus: Let ap be any element of A. Since A is not 
finite, it has an element a, other than ay. Again since A is not finite, it has an element 
a other than ao and a;. And so on. {a,,|m € N} is a subset of A of size No. 

Both arguments are fallacious, and the theorems cannot be proved in 
ZF = ZFC minus AC. The argument for (A) assumes we have a specific 
enumeration for each 4,,, and requires (CC) below, applied to F(n) = the family 
of all enumerations of 4,. That for (B) requires (DC) below, applied to the 
relation R that holds between an ordered m-tuple of elements of A and an 
extension to an ordered (m+ 1) — tuple of elements of A whose last element 
is different from its first m. 


Countable Choice (CC) If F is a function with domain N with F(n) £4 © for 
all n, then there is a function f with domain N with f(m) € F(n) for all n. 
Dependent Choice (DC) If 2 is a relation on a set X such that for all x € X there 
exists a y¢X with xRy, then there is a function f: N—X with f(n)Rf(n + 1) 
for all n. 


DC implies that there is an infinite descending sequence in any order that is 
not a wellorder. It is left to the reader to show that CC is implied by DC, and DC 
by the following: 


(ACx) Let J be any nonempty set, and _X the family of its nonempty subsets. 
Then there exists a function ¢: ¥ + J with e(A) €A for all A in X. 
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Such an ¢ is called a choice function for I. AC* is implied by AC as originally 
formulated. (The converse is also true, and left as an exercise.) For consider the 
set of pairs (A, a) with a € A EX and the partition of that set in which the cell of 
(A, a) is the set of pairs (A, b) with same first component. Then a selector for this 
partition will be the required e. A contrast between the special cases CC or DC 
of choice and full strength AC or AC* of choice is that applications of the latter 
tend to be conspicuous, those of the former, easily overlooked. 

With full AC, theorem A can be generalized to any aleph k, to show that a 
union of <x sets each of size <« has size <«. An aleph A is called regular if a 
union of < A each of size < i has size < A, so the result just stated can be 
restated as saying that the successor A = xt of any aleph x is regular. Theorem 
A itself says &; is regular. 

Henri Lebesgue’s theory of “measure” (1902) involved an early application 
of Cantor’s ideas. The theory shows how to define notions of length, area, and 
volume for a family of curves, surfaces, and solids, called Lebesgue measur- 
able, extending far beyond those of Euclidean geometry. One can define, for 
instance, for an extensive family of subsets of the unit circle, a reasonable 
notion measure or “total length” with these features: 


(1) The measure of the whole circle is its circumference 27. 

(2) Ifone set can be carried to another by rotation, they have the same measure. 

(3) The measure of the union of a countable family of pairwise disjoint sets is 
the sum of the measures of the individual sets. 


Lebesgue’s work left open whether such a notion of measure » could be 
defined for all subsets of the circle. Giuseppe Vitali (1905) answered this 
question negatively by a conspicuous application of AC. Call points on the 
circle equivalent if one can be carried to the other by a rotation of a 
rational fraction of a full 360°. Apply AC to obtain a selector S with one 
point in each cell of the induced partition. Since the whole circle is the 
union of the rotations of S' through rational angles, of which there are only 
countably many, by (1) and (3) the sum of the measures of these sets 
should be 2z. But by (2) the measures of these sets should all be equal. 
And the sum of denumerably many copies of the same quantity (5S) must 
be either zero (if u(S) = 0) or infinite (if (S) > 0). Vitali’s counterexam- 
ple has many elaborations, of which the most famous is the Banach—Tarski 
paradox: a solid ball can be disassembled into a finite number of pieces 
which can be rotated, translated, and reassembled into two balls the same 
size. See Blumenthal (1940) for an early account in English. 
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7.2 Equivalents of Choice 


AC also has positive consequences, notably the following crucial result of 
Zermelo. 


Wellordering Principle (WO) Every set is wellorderable. 


Proof: Given a set A, fix a choice function ¢ for A, and let e be any 
nonelement of A. Define by recursion an operation f assigning to ordinals 
elements of A as follows: If f(B) has been defined for B < a, consider 
B= {f(B)|B < a}. Let f(a) = &(A — B) if A—B F SG, andf(a) = e other- 
wise. By Hartogs’ lemma there must be an o for which the latter case applies, 
else we would have an injection of the Hartogs’ number of A into A. For the least 
such a we have A= {f(B)|B <a} and we can wellorder A by setting 
F(DRE(B) iff y < B. 

That conversely WO implies AC* is almost immediate. To get a choice 
function for a set J having a wellorder R on J, let e(A) = the R least element 
of A, for any nonempty A C/. Many positive applications of AC can be made 
without bringing in apparatus pertaining to wellorders (which many mathemat- 
icians would prefer to avoid) by use of the following. 


Zorn’s Lemma (ZL) A partial order in which every chain has an upper bound 
has a maximal element. 


Proof: Given the partial order P = (P, <,), fixa choice function ¢ for P, and let 
e be any nonelement of A. Define by recursion an order-preserving operation 
from ordinals to P as follows. 


F(0) = e(P) 
f(B) = e({p €Plp>p #(B)}) iff(B) is not maximal, and = e otherwise 


(a) = &({p € Pip is an upper bound to the chain {f(B)|B < a}}) at limits 


Apply Hartogs’ lemma as in the proof of WO. 

Conversely, ZL implies AC. Given a partition X of a set J, let P consist of all 
partial selectors, or subsets of J containing at most one element of each cell, 
partially ordered by inclusion. It is easily checked that every chain has an upper 
bound (its union), and that a maximal element must be a selector (else we could 
add one element from any cell missed). 

Assuming AC, all cardinals are alephs, and so «-« = « for all «. Alfred 
Tarski showed this result implies AC. See Gillman (2002). There are endless 
other known equivalents. See Rubin and Rubin (1970). 


Downloaded from https://www.cambridge.org/core. IP address: 154.28.188.203, on 03 Feb 2022 at 10:51:02, subject to the Cambridge Core terms 
of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.1017/9781 108981828 


Set Theory 4] 


8 Topics in Higher Set Theory 


The material presented so far includes the basics of what would be covered (with 
less philosophical commentary and with a good deal of mathematical detail here 
left to the reader spelled out for the student) in any of the several fine introductory 
textbooks on various levels that are available. Halmos (1960) covers less, 
Hrbacek and Jech (1999) more. This includes about as much of set theory as 
working mathematicians in most branches of mathematics are acquainted with. 
There is a great deal more to set theory than that, but just as Euclid’s Elements 
leaves out such advanced topics as we find in the works of Apollonius and 
Archimedes, so this Element will have to pass over a vast amount of material in 
silence, or with only few allusions. And while some ideas of the proofs of the 
results that do get cited in the remainder of this work will be indicated, there will 
usually be even less detail than in the proofs or proof sketches up to this point. 
And while the names of many of the principal contributors to the subject will be 
mentioned, because the earlier of their original papers are often in French or 
German, and the later often would require years of graduate study to be able to 
read, citations merely for purposes of documenting historical attributions will be 
suppressed, except for the most important landmarks. 

Higher set theory comprises three areas of study: (i) descriptive set theory, 
concerned with the reals and special sets thereof, or roughly with V(@ + 1); 
(ii) continuum theory, concerned with abitrary sets of reals, or roughly with 
V(@ +2), (iii) combinatorial set theory, concerned with arbitrary sets of 
arbitrary elements, and with higher V(a). (The V notation is as in §6.4.) 
They have been listed in order of decreasing direct relevance to other branches 
of mathematics. Indirect relevance is another matter, since the three areas turn 
out to be connected with each other in deep ways it will take some time to 
bring out. All three abound in questions that cannot be settled on the basis of 
the axioms of ZFC alone, so that many of the most important results to be 
reported belong to the so-called metamathematics of set theory, being the- 
orems about what isn’t a theorem of ZFC. Axioms beyond ZFC eventually get 
brought in, and here the different characters of the three areas come out. 
Axioms arguably expressing the thought that the universe of sets is maximally 
“high” (so-called large cardinal axioms) have turned out to tell us almost 
everything about area (i) and almost nothing about area (11). Axioms arguably 
expressing the thought that the universe of sets is maximally “wide” (so-called 
forcing axioms) are having great impact on area (11), but less on area (iii). But 
before taking up these advanced matters, we need some sample results from 
each of the three areas. 
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8.1 Descriptive Set Theory 


The development of geometry within algebra and analysis and thence set theory 
involves identifying linear or plane points with elements of R or R @ R. Once 
it is clear we are dealing with mathematical “spaces” and not physical space, we 
can freely introduce Euclidean spaces of any dimension, and non-Euclidean 
spaces of any kind. The theory of special sets of points on the real number line is 
called descriptive set theory, but its results extend to a wider range of spaces 
treated, for instance, in the classic Kuratowski (1966), and called Polish spaces. 

Still, we begin with R. Its countably many basic sets are open intervals Ja, b/ 
defined as {x € R|a < x < b} with rational endpoints a, b. (Often in the literature, 
the simple notation (a, b) is used for the open interval, but we have been using 
that simple notation for the ordered pair, which we have had much more 
occasion to mention. Either way [a, b] denotes the closed interval 
{x€Rla < x < b}.) Two intervals, Ja, b[ and Jc, d[are separate if b < c or 
d <a, so they neither overlap nor abut. In the real plane R ® R, similar definitions 
can be made starting from rectangles. The projection of a planar set A C R® R is 
the linear set B C R that would be called dom A when thinking about A as a relation, 
namely, the set of first components of ordered pairs in A. Further sets of interest are 
classified in point classes as defined in the adjoining Tables 17 and 18. 

Using such facts that countable unions of countable sets are countable, and 
the DeMorgan and Distributive laws for indexed families, it can be shown that 


Table 17 Lower Point Classes 


Class Definition 

open unions of basic sets 

closed complements of open sets 

F, countable unions of closed sets 

Gs countable intersections of open sets 

Borel sets obtainable from the above by further countable intersection 
and union 


Table 18 Higher Point Classes 


analytic projections of Borel sets 

coanalytic complements of analytic sets 

PCA projections of coanalytic sets 

CPCA complements of PCA sets 

projective sets obtainable from the above by further projection and 
complementation 
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the intersection or union of two (and hence of finitely many) sets in any of these 
classes 1s in the class, that Gs sets are complements of F, sets, that open sets are 
F, and closed sets are Gs, that countable unions of F, sets are F, and countable 
intersections of Gg sets are Gs, and so on. Such basic theorems would be early 
exercises in a textbook. 

Even sets in the lowest point-classes can be by ordinary standards quite 
complicated. They include all the “fractals” pictured in coffee-table books. 
Yet since there are only countably many basic sets, there are only ¢ open sets, 
and this extends to projective sets. Since there are 2° arbitrary subsets of R, this 
means descriptive set theory is concerned with only a small fraction. Yet sets in 
the indicated point classes include all those ordinarily encountered in analysis, 
apart from pathological counterexamples like Vitali’s. 

Higher theorems about a point class I are of two kinds: A regularity theorem 
says that each set in T is “nice” in some way (Lebesgue measurability being one 
kind of “niceness” especially important for other branches of mathematics). A 
structural theorem says that different sets in Tare “nicely related” in some way. 
For instance, sets C, D are said to reduce sets A, B if the following relationships 
hold: 


CCA DCB CUD=AUB CND=@, 


The reduction principle for T says any pair of sets in I is reduced by some 
pair of sets in T. 


Sample Structural Theorem The reduction principle holds for F, sets. 


Proof sketch: Let A be the union of closed 4; and B the union of closed 
B; for iE N. Let C consist of those x such that for some /, x is in 4; but not in 
B; for any j < i, and D of those x such that for some i, x is in B; not in.A; for any 
J <i. The elements of A U B that “get into 4 no later than into B” are in C, while 
those that “get into B sooner than into A” are in D. Reduction is easily verified, 
while C and D being F, can be verified using the fact that finite unions and 
intersections of closed and open sets are simultaneously F, and Gs. 

A perfect set P is one that (i) is closed and (ii) has no isolated points. Here (i) 
implies that if every basic set containing a point x meets P, then x is in P, while 
(ii) means that any basic set U that meets P contains at least two distinct points x 
and y of P. From this it follows that U includes two separate basic sets Vand W 
that both meet P (V containing x and W containing y), which moreover may be 
taken to be as short as desired in length. So we can obtain basic sets Up and U, of 
length < ey that both meet P, then separate basic subsets Upo and Up, of Up and 
Ujo and U;; of J; of length < ue that all meet P, and so on. For any infinite zero- 
one sequence, say o = (0, 1, 1, . . .) the intersection of Up and Up; and Up; 
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and Upi1; and so on will be a singleton {x,} such that every basic set containing 
X, meets P, making x, an element of P, while separateness of the different Us 
means that distinct os give distinct x,s. Hence P contains as many elements as 
there are zero-one sequences, and we have sketched a proof that any perfect set 
has cardinal ec. 

A set A has the perfect set property if it is either countable or contains a 
perfect subset. This implies that A is either countable or of cardinality c, but is a 
stronger statement, since AC implies there exist pathological sets of size ce 
without perfect subsets. To sketch a proof, since there are only c perfect sets, 
with AC we can wellorder them in a sequence P, indexed by ordinals a < ¢, so 
that each has < ¢ predecessors. And then we can inductively choose x, and y, 
distinct from each other and from all xg and y, with B < a, and both in P,,, since 
at stage a only fewer that ¢ points will have been chosen, while P, has ¢ to 
choose from. In the end, the set of all x, will have size ce, but include no perfect 
set Py, having left out y,. 

Now consider a closed set C and go back to Cantor’s construction with which 
we began in §1.1, successively discarding isolated points, taking derived sets 
indexed by ordinals. Each time a point is discarded, it belongs to a basic set with 
no other point in it, and we can say that whole basic set is discarded with the 
point. Since there are only countably many basic sets to discard, the process can 
only go on for countably many stages, and can only discard a countable set Cp of 
points. If nothing is then left, then C = Cp is countable. If anything is left, it is a 
perfect set. We have sketched the proof of the following. 


Cantor—Bendixson Theorem Any uncountable closed set is the union of a 
countable and a perfect set. 


In particular, closed sets have the perfect set property. But then so does any 
union of a countable family of closed sets, since if each set in the family is 
countable, so is their union, while if any contains a perfect subset, the union 
contains it, too. So we have the following. 


Sample Regularity Theorem Every F, set has the perfect set property. 


The Polish and Russian schools between the world wars created classical 
descriptive set theory, obtaining regularity and structural theorems much 
stronger than our samples, extending to much larger point classes: notably, 
the Lebesgue measurability (and an analogous Baire property) of all analytic 
and coanalytic sets, and the reduction principle (and a stronger uniformization 
principle) for PCA sets. But then progress halted. The reason why will emerge 
later. 
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8.2 Continuum Theory 


A couple of negative general theorems about arbitrary sets of reals have already 
been given: the theorem that not all sets have the perfect set property, and that 
not all are Lebesgue measurable (to which it could be added that not all have the 
property of Baire). Such theorems have at least the value for other branches of 
mathematics of warning the mathematician not to try to do the impossible. And 
such negative results are, in a sense, what motivates descriptive set theory: If a 
regularity property cannot be established for all sets, then let us look at larger 
and larger classes for which it can be established. And in order to have still sets 
with a given regularity property while performing various operations creating 
new sets from old, we will want structural theorems saying that various classes 
are closed under various operations. 

One significant open question at this level has been mentioned in §6.2, the 
status of the Suslin hypothesis (SH). But the main question in the theory of 
arbitrary sets of reals is the continuum hypothesis (CH). Actually, there are two 
propositions so named, as follows. 


There is no cardinal A with No <A<2% 92% =. 


The second implies the first, and the first implies the second if c is an aleph, so 
assuming AC, as we will here, the two are equivalent. CH was conjectured by 
Cantor, and placed first on his 1900 list of mathematical problems for the new 
century by Hilbert. Like AC, CH has many equivalents and interesting conse- 
quences (see Sierpinski, 1956). The main alternatives to CH considered have 
been that ¢ = NX and that ¢ is a fixed point of the alephs, a « such that kK = Xx. 
One isolated result proved early is Kénig’s theorem that ¢ £ Nw. But after this 
there were a several of decades of lack of progress. 

The reason why emerged in the middle 1900s. Gédel’s First Incompleteness 
theorem tells us any reasonably strong consistent mathematical axiom system T 
will leave some ¥ undecidable, neither provable nor disprovable. (By contrast, 
an inconsistent axiom system can prove anything. For a contradiction ® A ~® 
implies ®, but also ~® hence ~-® VY, which is ® D ¥ while ® and ® D P 
imply ‘, whatever it may be.) Gédel's Second Incompleteness theorem gives a 
specific example, telling us that Con(T), the assertion that T is consistent, will 
be unprovable (if true). But the first natural-looking specific mathematical 
statement shown to be undecidable by ZFC was CH. 

Kurt Gédel (1940) and Paul Cohen (1966) proved respectively that ZFC 
cannot disprove and cannot prove CH (assuming ZFC is consistent). Their 
precise results are stated in Table 19. Note that, per the second incompleteness 
theorem, the results are relative (“if this is consistent, so is that”) rather than 
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Table 19 Undecidability “Metatheorems” 


Author Metatheorem about CH Metatheorem about AC 


Gédel Con (ZFC) > Con(ZFC + CH)  Con(ZF) 5 Con (ZF + AC) 
Cohen Con (ZFC) > Con (ZFC + =CH) Con(ZF) D> Con (ZF + 7=AC) 


absolute (“such-and-such is consistent’). Gédel’s method of proof, with the so- 
called inner model of constructible sets, establishes relative consistency not 
only for CH but for a generalization GCH to be introduced in §8.3. Cohen’s 
method of proof, that of so-called forcing, can be used to give a different proof 
of relative consistency for CH as well as a proof of =CH. Both methods have 
been found to have innumerable other applications and are absolutely central to 
the ongoing work of set theorists today. While it would be infeasible to attempt 
to expound these methods in any detail in a work such as this — they are not 
covered even in the most thorough introductory-level texts — it will prove 
possible to give some idea of their nature later. 

Meanwhile let it be noted that either method can be used to prove the relative 
consistency of —SH, as was done by Ronald Jensen using Gédel’s method and 
by Thomas Jech and independently by Stanley Tennenbaum using Cohen’s 
method. Further, Robert Solovay, among many other important early applica- 
tions of Cohen’s method, used it in joint work with Tennenbaum to prove the 
relative consistency of SH. 

Although these methods originated to deal with problems at the level of the 
theory of arbitrary sets of reals, they have also applications at other levels. In 
particular, John W. Addison, using Gédel’s work, showed that regularity the- 
orems cannot be extended to higher point classes than the classical workers 
between the world wars had handled, while Azriel Levy, using Cohen’s method, 
showed the same for structural theorems, thus explaining the impasse that had 
been reached by the Polish and Russian schools. 

All this means that, if regularity and/or structural theorems are to be obtained 
for higher point classes, or if the status of SH and/or CH is ever to be settled, 
new axioms beyond ZFC will be needed. Gédel and Cohen drew opposite 
philosophical conclusions from this situation, Cohen doubting there was any 
fact of the matter about whether CH is true, and Godel (1947) advocating a 
search for new axioms to prove or (as he thought more likely) disprove it. Thus 
far, however, no hypothesis settling the size of ¢ has acquired the status of an 
accepted axiom, the way AC, after some resistance, eventually did. But a 
vigorous research program continues, of which more later. 
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8.3 Combinatorial Set Theory 
There are two propositions called the generalized continuum hypothesis (GCH): 
There is no cardinal ) with k < 4 < 2* for any « = No.1 for all a. 


As with CH, the second implies the first and the first implies the second 
assuming AC; but, in fact, we can simply say they are equivalent, since a 
striking result of Waclaw Sierpinski tells us the first version implies AC. (See 
Gillman 2002.) GCH is the main question of interest about the arithmetic of 
arbitrary cardinals, and perhaps as such the main question of interest in the 
theory of arbitrary sets of arbitrary elements. But it is hardly the only question of 
interest, there being, to begin with, a large body of results constituting an 
infinitary combinatorics, related to transfinite arithmetic as finite combinatorics 
(beginning with the highschool topic of counting permutations and combin- 
ations) is related to ordinary arithmetic. As this subject may be unfamiliar even 
at the level of the countable, let alone higher cardinals, let us begin with sample 
theorems at that level. 

First, a tree is a partial order (P, <) with a minimum element, in which the 
predecessors of any element are wellordered. The elements are called nodes, the 
minimal one the root, the order type of the predecessors of a node its /evel, and 
the supremum of nodes’ levels the tree’s height. We will be concerned for the 
moment only with trees of height <q. A infinite branch through such a tree is a 
chain containing one node at level n for each n < o. 


K6nig’s Infinity Lemma Any finitely branching tree of infinite height has an 
infinite branch. 


Proof: If the tree has infinite height, it has infinitely many nodes, all above its 
root Xo. Since xo has only finitely many nodes immediately above it, by the 
“pigeon-hole principle” at least one such node x; must have infinitely many 
nodes above it. Similarly, at least one node x2 immediately above x; must have 
infinitely many nodes above it, and so on. Then {xo, x1, 2, . . . } isa branch. 

Second, for any set XY of more than two elements, [X]° denotes the set of two- 
element subsets of X. A two coloring for X is a partition of [X’ ? into two cells. 
We may think of the elements of X as dots with the segments connecting any 
pair of them colored red or blue. A homogeneous set is a subset of Yof X such 
that [Y P is included in a single cell of the partition. All dots in Yare connected 
with the same color. 


Infinite Ramsey’s Theorem Any two coloring of an infinite set has an infinite 
homogeneous set. 
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Proof: Consider the hypothesis (H) Any infinite subset of X contains an element 
red connected to infinitely many other elements of that subset. First assume (H) 
holds. Let Xo = X and take an element ap of Xo red connected to infinitely many 
elements of Xp. Let _X; be the set of elements of Xp red connected to ag. Take an 
element a, of X; red connected to infinitely many elements of X;. Let X2 be the 
set of elements of X; red connected to a;. And so on. {ao, a1, . . .} is a red 
homogeneous set. Now assume (H) fails, so there is some infinite subset Yo of X 
such that any element thereof is red connected to only finitely many elements 
thereof, and hence is blue connected to infinitely many. Take an element bo of Yo 
blue connected to infinitely many elements of Yo. Let Y; be the set of elements 
of Yo blue connected to by. And so on, to obtain a blue homogeneous set. 


Finite Ramsey’s Theorem For every finite m there is a finite m such that any 
two coloring of a set of size n has a homogenous set of size m. 


(For m = 3 we may take n = 6. Connect six dots red and blue and there will be a 
red or a blue triangle.) Proof sketch: Suppose for some m there is no suitable n. 
We form a finitely branching tree of configurations. At the root, there is just one 
dot. Above it at the next level are two configurations, both adding a second dot, 
but differing as to the color, red or blue, of its connection with the first. Above 
each are four configurations, each adding a third dot, but differing in the color 
pattern of its connections, red-red or red-blue or blue-red or blue-blue with the 
first two. And so on. Now go through the tree removing configurations in which 
there is a homogeneous set of size m (all configurations above one so removed 
being removed along with it). Since there is no suitable n, there will still be 
nodes at level n for all n. By K6nig’s lemma, the tree must have an infinite 
branch. But from such a branch, we obtain a configuration of infinitely many 
dots with no homogeneous set of size m and hence certainly no infinite homo- 
geneous set, contrary to the infinite Ramsey theorem. 

Both versions of Ramsey’s theorem can be generalized, to consider partitions 
of n-membered subsets rather than two-membered, and to allow m colors rather 
than two. The proof just given of the finite theorem is an example of a route to a 
result about the finite with a detour through the infinite. There are any number of 
such proofs in mathematics, a famous one being the original proof by Pafnuty 
Chebyshev of Bertrand ’s postulate to the effect that there is a prime between 
any prime and its double, which used complex analysis. The proof of Fermat’s 
theorem by Andrew Wiles makes use of especially complicated apparatus. 
Hilbert hoped it could be shown that detours through the infinite can always 
be avoided, and certainly they sometimes can. Ramsey’s original (1930) and 
more difficult proof of his finite theorem involved no infinitistic moves. And 
Srinivasa Ramanujan and Paul Erdés have each given “elementary” — meaning 
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in this context noninfinitistic, not easy — proofs of Bertrand’s postulate. There 
remains hope also that someone, somewhere, sometime will obtain one for 
Fermat’s theorem. But J. B. Paris and Leo Harrington found a version of the 
finite Ramsey theorem “with bells and whistles” for which they could show the 
detour through the infinite to be indispensable. 

One large class of results about arbitrary sets of arbitrary elements consists of 
attempting to determine how far Ramsey’s theorem can be generalized: For 
which cardinals «, 4, , and natural numbers y, is it true that any partition of 
the n-membered subsets of a set K of size « into p classes will have a homoge- 
neous set of size A, a subset L C K of that size all of whose n-membered subsets 
belong to the same class? This relationship is written k — (A)",. In this notation 
the basic version of Ramsey’s theorem says No — (No)°2. Perhaps the best 
known further result of this type is that of Erdés and Richard Rado, according 
to which the relation holds for k = e+ and A = XN; and w= No andn = 2. 

Issues about “partition calculus” are a prominent part of “infinite combina- 
torics” although hardly the whole. Before leaving the subject, let me list just a 
few more results, whose proofs would take us too far afield here, to indicate the 
variety of questions that arise in this area, and perhaps pique the interest of some 
readers. (Hrbacek and Jech have especially good coverage for an introductory 
text. See also Kunen (1977).) 


Almost Disjoint Sets Call subsets of N almost disjoint if they have finite 
intersection. Then there is a family of size ¢ of pairwise almost disjoint sets. 

Dickson’s Lemma Let & be a positive integer and consider a sequence of 
k-tuples of natural numbers (am), ...@mg) for m€N. Then there exist 

m <nwith ayj;<a, fori=1,..., k. 

The A System Lemma Let X be an uncountable family of finite sets of 

countable ordinals. Then there is an uncountable subset Y of X and a finite set 

a of countable ordinals such that bQ c = a for all b and c in Y. 

Fodor’s Lemma Let f be a function from countable ordinals to countable 
ordinals with f(a) < o for all a > 0. Then there is a B such that f(a) = B 

for uncountably many a. 


9 Metamathematics of Set Theory 


Since the first development of non-Euclidean geometry, the standard way of 
proving consistency has been by constructing models, and the novelty in Gédel’s 
and Cohen’s work consisted precisely of new methods of model construction. Now 
since the existence of a model implies consistency, we cannot prove in ZFC the 
existence of a model for all of ZFC (provided ZFC is consistent, a parenthetical 
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proviso left tacit henceforth). But there provably are models for large fragments, 
especially among the V(a), sometimes called “natural” models. Before examining 
these we need to say what more precisely is meant by a “model,” which will involve 
examination of what is meant by “truth” and related notions, dodging paradoxes 
connected with such notions since antiquity. 


9.1 Truth 


We may begin with a modern, specifically set-theoretic paradox. Since there are only 
countably many finite sequences of symbols from a finite alphabet, there are only 
countably many ordinals definable by finite expressions. But there are uncountably 
many ordinals, so there must be undefinable ones, and among these a least. But that 
one is definable as the least undefinable ordinal. Such is Kénig’s paradox. One 
response is that in speaking of definability we must say in what language, and if it is 
that of set theory, what the argument shows is that set-theoretic definability is not set 
theoretically definable. But if set theory is supposed to accommodate everything 
mathematical, ought it not to accommodate defining definability? The response 
“that’s not mathematics, it’s metamathematics” underwhelms. 

Better to focus on what is definable. If mathematical logic is to be accommodated 
like other branches of mathematics within set theory, the objects of the language L 
of set theory must be identified with sets of some kind. Well, proofs are finite 
sequences of formulas, and formulas are finite sequences of symbols, and finite 
sequence is already a set-theoretic notion, so it remains only to identify symbols 
with sets. Epsilon and equals, the bar and caret and wedge of negation and 
conjunction and disjunction, the universal and existential quantifiers, and opening 
and closing parentheses for punctuation may be identified with the pairs (0, 1) 
through (0, 9). We also need variables x, y, z, ..., and these can be identified with 
pairs (1, 0), (1, 1), (1, 2), ... . Once it is clear we are dealing with a mathematical 
“language” and not natural languages, there is no reason not to expand it indefin- 
itely to a language adding a name a for every set a. The details of how to define 
such syntactic notions as substitution of one symbol for another when logic is 
reconstructed set theoretically need not detain us any more than how to define 
exponentiation when reconstructing analysis. It can be done, but not here. 

The notion of ¢ruth is “semantic” rather than syntactic, and cannot be 
expressed in LZ, since if it could, so could the derivative notion of definability: 
a is definable if there is some formula ®(x) of ZL such that the sentence 
Yx(O(x) x = a), identifying the item named a as the only one satisfying 
condition 9, is true. And if the notion of definability could be expressed in L, we 
would have K6nig’s paradox. A notion of truth in a model is, however, express- 
ible in LZ, and will be key. A model for present purposes will be a structure 
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Table 20 Recursive Characterization of Truth 


Clause Definition 

€ M\|=aeb if aeb 

a M| = —® if not M| = ~® 

A M|=(®AP) if M|= ® and M| = 

Vv M| = Vx®(x) if M| = O(a) for all a in M 


Table 21 Ao formulas 


Notion Formula 

x=@ Vy Ex (y#y) 

x = {y,z} yExAzeExAVwex (w=y Vw=z) 

x= (y,z) au exdvex(u = {y,y}Av = {,z}Ax = {u,v}) 
x= Uy Vuey Vveu (vex)AVVEx Aucy (veu) 
xcy VzeEx (zEy) 

xC Ply) Vzex (zCy) 

x is transitive Vy Ex Vzey (zEx) 

x is an ordinal x is transitive \ VyeEx Vzex (vez Vy=zVz Ey) 
xisa limit ordinal x is an ordinal An(x = @)AVy Ex 4zex (yEz) 
x= x is a limit ordinal A73y € x (y is a limit ordinal) 


M = (M, €) with M transitive, but while model and underlying set should be 
distinguished in principle, “abuse of language” can be allowed in practice, and 
the key notion of a sentence (formula without free variables) ® of the language 
L(M) with names (only) for elements of M being true in the model will be called 
truth in M and written M|= ®. The accepted treatment derives from Tarski and 
Vaught (1956) based on earlier work of Tarski. To begin with, we characterize it 
recursively in the adjoining Table 20. 

The clauses for =, V,4 are left to the reader. The characterizations provided 
by Table 20 can be made into an honest definition as was done for natural 
numbers in §5.3 (defining a good function, and so on). What is being defined is 
the function t(®) = 1 or 0 according as © is true or false in M, for sentences ® 
of L(M). If we tried to imitate the construction to define simple truth (in the 
universe of set theory, not just a model) for a sentence we would need a function 
like t but defined on all sentences however high the rank of items named in 
them. But these are too many to form a set. Yet we can go some way towards 
defining truth by restricting the logical complexity of the sentences involved. A 
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Ao formula is one that can be abbreviated to contain only bounded quantifiers, 
universal and existential, Vy € x and dy € x as defined in Table 7 in §2.2. Many 
notions can be expressed by such formulas, and a few are shown in the adjoining 
Table 21. One could add “x is a partition” and “y is a selector for x” and more, 
but not “x is countable” or “x = P(y),” which require unbounded quantifiers, 
existential and universal, dy and Vy, respectively. 

A crucial property of a Ao sentences is absoluteness. If Mo and M, are 
transitive and ® a Ag sentence of L(Mp) NL(M;), then Mo| = © iff M,|= ©. 
For while with an unbounded universal quantifier, when evaluating it per clause 
(4a) in M; one has to look for a possible exception throughout (although not 
beyond) Mj, and there may be one in M but not M;, by contrast with a bounded 
universal quantifier Vx € a one only needs to look at elements of a in Mo and in 
Mi, and these are the same, namely a// elements of a, by transitivity. Similarly 
with existential quantifiers. Thus a model cannot “make a mistake” about 
whether a Ao sentence is true, about whether the item named a is or is not the 
empty set, or whether the item named 0 is or is not the unordered pair of those 
named c and d, and so on down the list in in the table. We may therefore define 
Ao truth (truth for Ag sentences) as truth in some, or equivalently every, model 
containing all items named in the sentence. 

A >>, or I; formula is a Ag formula with an unbounded existential or 
universal quantifier out front, or a string of them (although in examples just 
one will be written). Suppose Mp and M, are transitive with Mp CM. Ifa >>, 
sentence 4y®(y) of L(Mo) is true in Mo, then there is some “witness” b in Mo 
such that ®() is true in Mo, and O(b) being a Ao sentence it will still be true in 
M,, as then will be Ay®(y). >, sentences “relativize up” from smaller to larger 
models, while similarly II; sentences “relativize down.” We may define >, 
truth (respectively IJ, truth) as truth in some model (respectively all models). 
Continuing, adding alternate universal and existential quantifiers, we can define 
the Levy hierarchy of ~,, and II, formulas for all n, with truth notions for each. 
The truth definitions get longer as n gets larger. So, although every formula can 
be reduced to a logically equivalent formula that is 3~,, using simple logical 
equivalences, and called ~,, in a generalized sense, we cannot use this fact to 
define truth, since we would have to combine the 5~, truth definitions into a 
single infinitely long definition. 

Truth-in-a-model will imply real >; truth, and real IT, truth will imply truth-in-a- 
model, but not always conversely. A model may “think” something is the uncount- 
able P(w), or family of all subsets of , when it is really only some countable 
family of subsets of w, making this mistake because all in the model is not the same 
as all. As a consequence of a famous result of mathematical logic, the L6wenheim- 
Skolem Theorem, that any sentence or countable set of sentences has a model has a 
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countable model. Applied to ZFC (assuming it has a model, which recall cannot be 
proved, since having a model implies consistency, which cannot be proved), this 
tells us that theorems about the existence of enormous uncountable sets have 
countable models, an observation often called Skolem’s paradox. (See Skolem 
1922/1967.) We have just seen the explanation, but some philosophers still worry 
how anything we say could guarantee that when we utter “the power set of omega” 
we are talking of the real P(w) and not a countable imposter of which the things we 
are saying happen to be true in some model. See Putnam (1980). 


9.2 Inaccessible Cardinals and Their Status 


By the informal phrase “large cardinal” set theorists mean a cardinal that towers 
above all smaller ones in something like the way the infinite towers over the 
finite. To be counted by set theorists as “large,” a cardinal « must be a /imit, 
meaning that if 4 < x, then A <x, and even a strong limit, meaning that 
2 < «. It must also be regular, meaning that a union of < « sets of size < K 
has size < «. A regular strong limit is called an inaccessible (because it cannot 
be reached from below by the most common methods of producing larger sets 
from smaller), and these are the smallest large cardinals, first considered by 
Zermelo (1930) and independently of him by Sierpinski jointly with Tarski. We 
will meet others later. 

But let us first see what is special about inaccessibles. 

The axiom of extensionality, in the form (2) of §3.1, VxVy 
(xCyAyCxDx=y), is Il, and true, hence true in any model; and the same 
goes for foundation. The truth of other axioms in a model M requires some 
“closure” properties of the underlying set MW. The truth of pairing in VM requires 
that for every a and b in M there isacin M that M “thinks” is {a, b}. But since 
this is one of those Ag matters about which a model cannot be mistaken, this 
means that for every a and b in M, the real {a, b} is in M, a feature called 
“closure under forming pairs.” Similarly, the truth of union is equivalent to 
closure under forming unions. Infinity may be alternately formulated as 
A(x = w), making its truth in M equivalent to presence of the real o in M. 
While “partition” and “selector” are not in Table 21 of §9.1, these notions are 
Ao, implying choice will be true in // if every partition in M has a selector in M. 

Power set requires that for every a in M there be some 5 in M that M 
thinks is P(a), which means that for all c in M, c is in b if c is a subset of 
a, which, in turn, means that really b is P“(a) = P(a) MM. Closure under 
@™ is the criterion for truth of the axiom. If the model is supertransitive, 
containing all subsets of its elements, then P” (a) = P(a) and the criterion 
is closure under ?. Separation is more complicated. By the relativization ©” of 
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a formula ® to M is meant the replacement of every quantifier Vx or 4x by 
Vx €M or Ax € M. Truth of a sentence ® in & is equivalent to truth of the Ao 
sentence ©”. Truth in M of instance of separation for ® requires for each a in M 
the presence in M of {x € a|\®”(x)} Ca. A more than sufficient condition is 
supertransitivity. 

The V(q) are supertransitive, and considerations of rank show that for limit a 
they are closed under formation of pairs, unions, and power sets, besides 
containing @ provided a > @, and a selector for any partition, assuming AC. 
Thus for limits > @ we get a model of everything but replacement. A more than 
sufficient condition for replacement would be that any subset of V/ of the same 
cardinality as some element of / is an element of M. This condition is satisfied 
if & = «, an inaccessible cardinal, since it can be proved by induction for such a 
cardinal that ||V(B)|| < « whenever B < k (using the strong limit property for 
successors and regularity for limits). Thus we have (a sketch of) a proof that if« 
is inaccessible, then V(t) is a model of ZFC. 

If we write IC for the hypothesis of the existence of an inaccessible, we have in 
outline proved IC > Con(ZFC). This means that, in contrast to the situation with 
CH seen in Table 19 of §8.2, we cannot hope to prove Con(ZFC) 
> Con(ZFC + IC), because then in ZFC + IC we could prove Con(ZFC + IC), 
contrary to the second incompleteness theorem unless ZFC + IC is inconsistent — 
and no one hopes that. This illustrates how large cardinal axioms are inherently 
risky. One cannot hope to prove even their relative consistency. Their greater 
inconsistency risk is (euphemistically) called greater consistency strength, and the 
larger the cardinal, the greater it is. The hypotheses get riskier and riskier until we 
come to the so-called Reinhardt cardinal, the assumption of whose existence 
Kenneth Kunen showed to imply a contradiction. 

A brief polemical aside (see Maddy (2017) for nuance, Mathias (1992) for 
contrast): Saunders Mac Lane introduced category theory alongside group theory 
and so on as the theory of another class of set-theoretic structures. But enthusiasts 
have proclaimed it a rival to set theory as a “foundation.” What seems to be 
involved here is a difference in understanding of the word “foundation,” which 
anyhow has been avoided above in favor of “framework.” An attempt was made 
to develop an axiom system for a category of all categories, to replace ZFC. It 
failed. A translation of ZFC into category-theoretic language has been developed, 
but the translation has not helped with major set-theoretic open problems. But to 
accentuate the positive, category-theoretic work of Alexandre Grothendieck, 
partly passed into folklore, led him to posit that the universe of set theory is 
made up of larger and larger “local” universes, which on examination prove to be 
precisely V(r) for inaccessible x. Thus he found a way to motivate the Zermelo’s 
assumption that there are arbitrarily large inaccessible cardinals. 
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Others have argued that the thought that the cumulative hierarchy is maximally 
“large” should be interpreted to suggest that anything we say in an attempt to 
describe just how large will inevitably be an understatement, not only true in the 
macrocosm of all sets, but “reflected” in some microcosm V(x). In something like 
this way, “intrinsic” motivations — as contrasted with “extrinsic” motivation 
provided by attractive consequences — has been claimed for inaccessible and 
some larger cardinals by Bernays, Levy, and others and others. Koellner (2009) is 
a sophisticated discussion of how far one can hope to go along such lines. 

A less ambitious version of this idea is provable as a metatheorem about what 
is provable in ZFC, the Levy Reflection Principle. A transitive set M is X 
absolute if every &, sentence of L(M) that is true is true in M: For any Ao 
formula ®(x, y) of L, and any a in M, if 3y@(a,y) is true, so that there is a 
witness b somewhere for which ®(a, b) is true, then there is such a witness in M, 
making 4y®(a,y) true in M. Then we have the following: 


D1 Reflection Principle There exist arbitrarily large X absolute V(qa). 


Given any y, to any Ap formula (x, y) and any a in V(y) for which 4y®(a, y) is 
true, associate the least ordinal B such that there is a witness in V(B). Replacement 
allows us to form the set of all such associated ordinals, and then take their 
supremum y* > y. All £1 sentences about elements of V(y) that are really true 
will be true in V(y*). There may be ¥ ; sentences about elements of V(y*) that are 
really true but not true in V(y*), but we can repeat the process and take y** > y* 
and y*** > y**, and so on. Then the supremum a of these ys is & , absolute. More 
elaborate proofs extend the result to 22, 23, and so on. We can get a natural 
model V(a) of all the other axioms of ZFC plus as many instances of replacement 
as desired. 


9.3 Inner Models and the Status of AC 


While Gédel’s original proof of the relative consistency of AC and CH is too 
complicated to reproduce here, for AC alone he later published a simplified proof, 
which can at least be outlined. AC has so far been used in our discussion of 
models where proving that AC is true in V(a), but nowhere else. In the present 
discussion, let the assumption of AC be dropped, and work in ZF, Zermelo— 
Fraenkel set theory without choice. Relativization ®* to a formula ¥ is the result 
of each Vy ... being replaced by Vy(‘¥(y) D ...), and similarly for 3, so that “for 
all sets” is reinterpreted as “for all sets for which ¥ holds,” and similarly for 
“some.” (Compare with the notion of relativization to a set M in §9.2.) Gédel’s 
simplified proof the relative consistency of AC consisted in showing that for a 
suitable ’ one can prove in ZF the relativization ©” of each axiom © of ZF, as 
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well as the relativization AC® of AC. Relativization preserves logical deduci- 
bility, and if a contradiction were deducible from the axioms of ZFC, one would 
be deducible from their relativizations, and hence from the axioms of ZF which 
imply those relativizations. It follows that if ZF is consistent, so is ZFC. But it will 
take some work to spell out what is Gédel’s relativizing formula ‘¥(x), called 
HOD(x) and read “x is hereditarily ordinal definable.” 

A set a is definable from an ordinal parameter or ordinal definable, or OD for 
short if there is some formula ®(x, y) of L and some ordinal a such that 
Yx(@(x, a) =x = a) is true. This formulation, in terms of truth, is not express- 
ible in L, although OD-in-a-model and 2, OD would be, using restricted 
notions of truth. Gédel’s trick is to show that there is a (comparatively small) 
n such that anything OD is =, OD, allowing (an equivalent of) OD to be 
expressed in L after all. To see this, suppose a is OD in the way indicated, by a 
xX, formula for some perhaps quite large & and some ordinal a. There are only 
countably many formulas of L and they can be assigned code numbers. Writing 
‘Y; for the formula with code number j, we will have ® = Y; for some 7. Apply 
the reflection principle to obtain a 2; absolute B > a. We then have this: 


(1) V(B) |=Vx(¥j(e, a) =x =a). 


In the proof of « - k = x for all alephs x, we saw how to code pairs of ordinals 
by single ordinals, and this obviously permits the coding of triples as well. Let y 
code the triple (B, a, i). Then we have @(a,y) where this says: 


(2) y codes a triple (8, a, 7) with B > a and i < @ such that (1) holds. 


Then if O(y, z) is Z,, (2) shows that ais 2, OD from parameter y. The least 
5 such that a is OD by this &,, formula © from parameter 6 may be called the 
defining ordinal O(a) of a. So a is OD if 46(6 = A(a)), and this 6 is necessarily 
unique if it exists. We may define a wellorder on OD sets by a <op Bb if 
O(a) < 0(b). And the notions OD and <op are expressible by formulas not 
much worse than 2 ,,. Call a hereditarily OD or HOD if not only a itself but 
every other member of its transitive closure at is OD. HOD(x) is Gédel’s 
relativizing formula promised earlier. 

The details of the proof that the relativization of each axiom of ZF to HOD is 
a theorem of ZF will not be given in this sketch. Note that HOD is transitive in 
the sense that if HOD(x) and y € x, then HOD(y). One then makes use of the 
criteria from §9.2 immediately above for axioms holding relativized to a 
transitive set, adapted to relativization to the transitive condition HOD. For 
instance, one must show that if a and b are HOD, so is {a, b}. The only element 
of {a, b}+ not in at or bt is {a, b} itself, so we need only worry about showing 
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it is OD given that a and b are. The principle at work in showing this is that 
anything definable from OD sets (as a pair is definable from its elements) is OD. 
Since different ordinals o and B may be needed to define a and 5, to define the 
pair we need to use the coding of pairs of ordinals by single ordinals. As for 
AC™©P it will come down to showing that any OD partition has an OD selector. 
The selector will be the one including the <gp least element of each cell. If the 
foregoing rather sketchy sketch does nothing else, it should illustrate the 
sophistication that Gédel brought to the metamathematics of set theory, which 
was then taken a step further by Cohen. 

A by-product of this work is the observation that, because statements of 
number theory and finite combinatorics can all be formulated so as to mention 
only elements of V(@), all of which are HOD, such a statement has the same 
meaning whether or not relativized to HOD. From this it follows that if it such a 
statement is provable assuming AC, it is provable without it. The same is known 
to hold for CH and any other hypotheses for which relative consistency has been 
proved. In contrast, large cardinal axioms, for which relative consistency cannot 
be proved, definitely make a difference. For instance, they imply Con(ZFC), 
which can be coded as a number-theoretic or finite-combinatorial statement. To 
be sure, as such it is rather artificial, and not something whose proof or disproof 
would be high up on any number theorist’s or finite combinatorialist’s research 
program. But for decades Harvey Friedman has been producing more and more 
natural-looking examples, though the long-await definitive exposition of his 
results, working title Concrete Incompleteness, must be awaited a bit longer, 
while preprints on particular aspects accumulate. 

We can now say something about Gédel’s proof of the consistency of CH (and 
GCH). A formula © amounts to, or the sets satisfying it collectively amount to — 
for both modes of expression are in use — an inner model of ZFC or some related 
set theory T if it shares three crucial properties established above for HOD. 


Every ordinal satisfies ©. 
© is transitive: Every element of a set satisfying © satisfies 0. 
Every axiom of T remains true when quantifiers are relativized to ©. 


Gédel’s original proof worked, not with the inner model HOD, but with 
another called L, whose definition is more delicate, giving more control of the 
outcome. It is, in particular, the minimum inner model: any sets present in it, that 
is, any sets satisfying its defining formula, must be present in any other inner 
model. The sets in L are called constructible and the inner model itself is called 
the constructible universe. Gédel shows that, in the constructible universe, the 
principle that every set is constructible, written V = L and called the axiom of 
constructibility, holds. (This is not a tautology: it says that if a set satisfies the 
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formula defining constructibility, then it also satisfies that formula when quan- 
tifiers are relativized to L.) He then proceeds to show that V = L implies AC, 
and CH, and indeed GCH. The facts have already been alluded to that Addison 
went on to prove that V = L implies the existence of non-Lebesgue-measurable 
PCA and CPCA sets, and that Jensen later proved that it implies ~SH (along 
with a lot of other combinatorial results), through an analysis of the so-called 
fine structure of L. 

For another example of a proof by inner models, long before Cohen, 
Fraenkel, and Andrzej Mostowski in the 1920s and 1930s, considered the 
question of the consistency of ~AC in the context of a modification ZFU of 
ZF that permits Urelemente, or atoms as they are sometimes called. In formu- 
lating ZFU, it is necessary to add a predicate meaning “‘is a set” and restrict the 
axiom of extensionality to sets, while asserting that non-sets or atoms a// have 
no elements. In the Fraenkel-Mostowski work, we need a version of ZFU 
asserting that there are infinitely many atoms. They then prove Con(ZFU) 
implies Con(ZFU + =AC) by an inner model construction. Their methods 
can also be applied to distinguish various stronger and weaker variants of AC. 

The key idea is that any permutation z of the atoms determines a permutation, 
by abuse of language also denoted 7, of sets of atoms, and then of sets who 
elements may be atoms or sets of atoms, and so on up the cumulative hierarchy, 
the general pattern being m(x) = {n(y)| vy Ex}. If B(x, y) is the formula y € x, it 
is evident that for any a and b, the formula ®(a,b) is true if and only if 
(x(a), 2(b)) is true, and this can be extended by induction on complexity to 
all formulas: in a slogan, “permutation preserves truth.” We then define a set x to 
be of finite support (FS) if there is some finite set of atoms X such that any 
permutation that leaves the elements of X fixed, meaning 2(v) = y for ally inX, 
will also leave x fixed. Then just as Gédel moves from OD to HOD we move 
from FS to HFS, the inner model of sets x such that x itself, its elements, 
elements of its elements, and so on down are all of finite support. Then 
considerations of the kind used in the HOD proof show that HFS is an inner 
model of ZFU. But it cannot be a model of AC, since AC implies every set can 
be ordered (indeed, well ordered), but no order R of the whole infinite set of 
atoms can have finite support. For given any finite set X of atoms, take any 
atoms a and b not in it, with say (a,b) € R, implying (b, a) ¢ R, and consider the 
permutation z that just switches those atoms, so that 2 applied to the pair (a, b) 
is the pair (b,a). Then the a changes the truth (a,b) € R into (b, a) €1(R), 
implying R # 2(R). While Gédel’s proof of consistency for CH involves 
further elaboration beyond his proof of consistency for AC, with Cohen the 
proof of the consistency of ~AC is more complicated than his proof of 
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consistency for ~CH, since he must combine his machinery of forcing with the 
ideas of Fraenkel—Mostowski to establish for ZF what they establish for ZFU. 


9.4 Forcing and the Status of CH 


Any account of Cohen’s forcing in a work at the elementary level of the present 
one will have to be even sketchier than the sketch of Gédel’s inner models just 
given; yet something can be said that it is hoped will give at least a taste of the 
flavor of forcing constructions. While Gédel started with the universe V of all 
sets and contracted to an inner universe where CH is true, Cohen could not start 
with the universe V of all sets and expand to an outer universe where CH is false, 
since there is no room for an outer universe containing more sets than all the sets 
there are. His procedure had of necessity to be a bit more indirect. What he did 
was to start from a countable transitive model M of ZFC, and expand to an outer 
countable transitive model N where CH is false. Now ZFC, as already remarked 
earlier, cannot prove the existence of a model of ZFC, let alone of a countable 
transitive model. However, close inspection of Cohen’s work shows that the 
truth of any finite subset of ZFC + ~CH in N requires only the truth of some 
finite subset of ZFC in M, and this fact, together with some logician’s tricks, 
permits one to conclude that if ZFC is consistent, so is ZFC + —CH. But in 
attempting to understand something about Cohen’s method, it is best to concen- 
trate on the model construction, and leave the trickery needed to extract a 
relative consistency proof to the logicians. 

The central notions used in forcing are, like those used in formulating Zorn’s 
lemma in §7.2, some further items to be added to the list of basic notions 
pertaining to partial orders given in §4.3. If (P, <) is a partial order, a subset 
DC Pis called dense if for every p in P there is aq <p in D. Two elements p, g of 
P are compatible if there is an r in P with both r<p andr<q. A subset A C P is 
an antichain if no two of its members are compatible, and P is ccc (or satisfies 
the countable chain condition) if it has no uncountable antichains. Finally, a 
subset G C P is generic for a family F of dense sets if it satisfies the following: 


Whenever p € G and p<gq, then g EG. 
Any two elements of G are compatible. 
For every D € F, there is a p € G with p € D (so that GND is nonempty). 


Note that for any countable F = {Dp,D\,D2, ...} there exists a generic G. 
Just take any po in Do, then any p; <po in D,, then any p2 <p; in Do, and so on, 
and for G take the set of g such that for some n we have p, <q. In particular, if P 
belongs to the countable model M, the set F of all its dense subsets that belong to 
Mis countable, and there is a generic set G for it. This G, however, in general 
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will not itself be in 1. What Cohen shows is that there is a minimum countable 
transitive model N = M[G| containing all the elements of M and G as well. 

For his application to CH, or rather ~CH, Cohen considers the ordinals a and 
Bin M that M “thinks” are X; and X>. Really they are both countable, and there is 
a surjection f from @ onto o and a surjection g from a onto ; it is just that there 
are no such fand g in the model M. For his set P of “forcing conditions,” he takes 
all functions whose domain is a finite set of pairs (y,m) with y < Bandm < a, 
and whose values are all Os and Is. The partial order relation < on P is just the 
reverse of set-theoretic inclusion C , so qg<p if the former is a function extend- 
ing the latter to a larger domain, and two “conditions” p and qg are compatible if 
there is no pair in the domain of both on which they give different values, so that 
their union is still a function. For any pair (y,m), the set D(y, m) of p with that 
pair in its domain is dense: if (y,m) is not already given a value by p, we can 
extend p to give it one. Similarly, for any y 4 6 the set E(y, 5) of p for which 
there is an m with p(y, m) 4 p(6, m) is dense. These dense sets are all in M, 
and Cohen takes a G generic for the family of all of them to form his M[G]. By 
the compatibility of all elements of a generic set, the functions in G fit together 
to form a big function g, which by the fact that G has nonempty intersection with 
each dense D(y,m) must have each (y,m) with y < B and m < @ in its domain. 
Considering the E(y, 5) it also follows that for any distinct y, 6 < f there is an 
m with g(y,m) 4 g(6,m). In other words, fixing y and considering the infinite 
zero-one sequence (g(y, 0),g(y, 1),g(y, 2), ...) we get different sequences 
for different y, and so a sequence of length B of distinct zero-one sequences. 
Since ¢ is the number of zero-one sequences, and B was playing the role of 
X», have we now shown the existence of a model with ¢ > x»? 

Alas, not quite. For we need to show that 8, which was the xX» of M, remains 
the X2 of M[G], which in turn requires that a, which was the X, of M, remains the 
X, of M[G]; or, in other words, we need to show that no surjection from @ onto a 
or from @ onto B has slipped into M[G]. This is the most technical part of the 
proof, and will be slighted here. Let it just be noted that it uses the fact that the 
partial order P is ccc (the proof of which uses the A system lemma from the end 
of §8.3). 

The Solovay—Tennenbaum forcing for SH was much more complicated, 
essentially involving performing one Cohen-style extension after another in 
an infinite iteration. D. A. Martin and Solovay (1970) extracted from the proof a 
principle that has come to be called Martin's axiom (MA), producing a proof of 
the relative consistency of ~CH + MA, as well as a proof that "CH + MA 
implies SH. MA says that if a partial order P has no uncountable antichain, then 
for any family F' of fewer than ec dense sets there exists a generic set G for F. 
Note that CH implies MA, since assuming CH “fewer than ce” amounts to 
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“countably many,” and we have seen that generic sets for countable families of 
dense sets always exist. Martin and Solovay report that, of the consequences of 
CH in the Sierpinski book, about half are provable from MA alone, and about 
half disprovable from MA + =CH, with a few left open. A particularly import- 
ant example in the former category is the following. One of the most basic 
results of Lebesgue’s theory of measure is that the union of countably many sets 
of measure zero still has measure zero, which assuming CH means that the 
union of fewer than ¢ sets of measure zero still has measure zero. But this last 
result actually follows without CH just from MA. 

Many further applications of MA have been found, not to mention applica- 
tions of the more general method of forcing that produced it. (See Rudin (1977) 
and Burgess (1977) for examples: the volume from which these papers come 
emphasizes exposition for mathematicians who are not logicians or set theoreti- 
cians.) MA is the premier example of what is called a “forcing axiom” or 
principle saying that various kinds of thing that could be made true by 
Cohen’s method already are true. (It was shown by Jouko Vaananen and 
Jonathan Stavi in the 1970s to follow from the principle that any statement of 
a certain form that can be made true by cce forcing and cannot then be made 
false by further ccc forcing is true, which has been independently discovered or 
rediscoverd by others since.) Saharon Shelah initiated the study of stronger 
forcing axioms than MA + —CH, and in particular of ones that imply not just e 4 
X, but ¢ = No. The strongest principle in this direction has been called Martins 
Maximum (MM), although its authors are not Martin but Matthew Foreman, 
Menachem Magidor, and Shelah. It has a wealth of consequences. 


10 Large Cardinals and Determinacy 


We turn next to the surprising connections that emerged, mainly in the 1970s 
and 1980s, between descriptive set theory, the theory of special sets of real 
numbers, and large cardinal theory, the loftiest part of the theory of arbitrary sets 
of arbitrary elements. 


10.1 Beyond Inaccessibles 


We have so far had occasion to mention (in §8.3) only the weakest large cardinal 
assumption, that of the existence of inaccessibles, which we called IC. But there 
is a zoo of large cardinals that are larger than inaccessibles, with more and 
stranger inhabitants than the zoo of fundamental particles in physics. Akihiro 
Kanamori (2003) provides a guide, which I cite as a one-stop source in prefer- 
ences to the original papers of various authors. Four out of the many species of 
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larger cardinals will have some degree of importance in what is to follow, and at 
least their names will be noted here in one place. 

Weakly compact cardinals, introduced by Ersés and Tarski, are ones for which 
the analogue of the most basic form of Ramsey’s theorem holds, one for which 
we have «> (i) in the notation of §8.3. 

Measurable cardinals, introduced by Stanislaw Ulam, are perhaps worth a 
few more extended remarks. In many cases, it is possible to distinguish among 
the subsets of some infinite set XY of some cardinality « certain ones that are in 
some sense large, and certain others that are in a corresponding sense small, 
where the two classes have the following properties: 


(1) The complement of a small set is large and the complement of a large set is 
small. 

(2) A one-element set is small and an all-but-one-element set is large. 

(3) Any set included in a small set is small and any set including a large set is large. 

(4) Any union of < « small sets is small and any intersection of < x large sets 
is large. 


For example, if X is the unit interval ]0,1[ and « the continuum c, then taking the 
Lebesgue measure 0 sets as small and the Lebesgue measure | sets as large, with 
have (1)-(3), and (4) also if MA is assumed. What we do not have in this example is 
this: 


(5) Every set is either large or small. 


For even apart from the question of the existence of nonmeasurable sets, there 
are many subsets of the unit interval of measure 2 or otherwise of intermediate 
size. If we do in some case get all of (1)-(5) we have in k a measurable cardinal. 
The definition (1)-(5) could also be recast in terms of the measure function w 
defined by (A) = 1 if A is large and p(A) = 0 if A is small. 

Woodin cardinals, named for their introducer Hugh Woodin, have a definition 
too complicated to be reproduced here. 

Supercompact cardinals likewise. One of these is used in a relative consist- 
ency proof for MM. 

As we go down the list, the cardinals are getting larger. It can be shown that if 
kK is weakly compact, then there are k many inaccessible cardinals < «. Similar 
relations obtain between weakly compact and measurable, and so on. 


10.2 Infinite Games 
We return to descriptive set theory to consider a rival to AC. By basic results in 


the Kuratowski book and elsewhere, many results established for any one Polish 
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space will carry over to all others, and for many purposes it is easier to work 
with an alternative to R, either the Baire space NN of infinite sequences of 
natural numbers, which looks just like the irrationals (using continued fraction 
expansions), or the Cantor space 2N of infinite zero-one sequences, which looks 
just like the famous Cantor middle-third set (the reals in the unit interval having 
a base-three expansion involving only digits 0 and 2). For any finite sequence s, 
of natural numbers or of zeros and ones, let U(s) be the set of infinite sequences 
beginning like s. These are the basic sets of the Baire and Cantor spaces, 
comparable to open intervals with rational endpoints in R, in terms of which 
other point classes of Borel and projective sets can be defined. 

The Polish school initiated study of infinite games of perfect information of 
the following kind. Let A be a subset of the Baire space. Two players, IN and 
OUT alternately pick natural numbers, each knowing at each stage the other’s 
previous picks, thus generating an infinite sequence x in the Baire space. IN 
wins if x € A, OUT wins if x¢ A. (IN goes first, like white in chess or black in 
go.) A strategy is a function from finite sequences of natural numbers to natural 
numbers. A player follows a strategy S in a play of the game if that player’s 
move is always the output of S for input the opponent’s sequence of previous 
moves. A strategy is winning for a given player if that player always wins when 
following it. Clearly both players cannot have winning strategies. If one or the 
other does, the game and the set A are called determinate. The Axiom of 
Determinacy (AD) asserts that all sets are determinate. 

AD is known to imply certain regularity properties hold for all sets (such as 
Lebesgue measurability, for which see Mycielski and Swierczkowski, 1964). A 
notable such result is the following from Morton Davis (1964). 


Davis Theorem Assuming AD, all sets have the perfect set property. 


Proof-sketch: The proof will be given for the Cantor space, but the result holds 
for all Polish spaces. Given a subset A of the Cantor space, consider a game that 
is asymmetrical in the sense that the two players make different kinds of moves: 
alternately IN picks finite zero-one sequences and OUT picks single digits zero 
or one. Their picks are strung together to produce an element x, of the Cantor 
space, with IN or OUT winning according as x is or is not in A. AD implies the 
determinateness of this game. Now suppose IN has a winning strategy S. The 
element of A obtained when IN follows S is different for each sequence of plays 
by OUT, showing 4 has c elements. A closer look shows that the set of such 
elements is perfect. Now suppose OUT has a winning strategy and consider any 
element x of A. See how far the beginning of x can be divided up into alternating 
finite zero-one sequences followed by the single digit that would be given by Sif 
OUT were following that strategy. The whole of x cannot be divided up in this 
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way, since then it would represent the result of OUT following S for the whole 
course of play, which would mean that x is not in A. So suppose we get this far, 
then are stuck: 


S0,10,S1,21,-++5 Snytn- 


What is the next digit jo in x beyond those in this finite display of data? Since 
we are supposed to be stuck, it is not what S would tell OUT to play in response 
to IN’s playing the empty sequence, but rather the opposite. What is the next 
digit /, after that? The opposite of what S would tell OUT to play if IN played the 
one-term sequence (jo). What is the next digit /; after that? The opposite of what 
S would tell OUT to play if IN played the two-term sequence ( jo, j). And so on. 
All the digits of x are thus generated in terms of S from the finite pattern 
displayed. Since there are only countably many such patterns, there are only 
countably many elements x of A. 

Since we have seen that assuming AC there are sets that do not have the 
perfect set property, the Davis theorem shows that AD contradicts AC, though 
the weaker DC is often assumed in conjunction with AD when considering its 
consequences. The existence of an indeterminate set could also be proved 
directly assuming AC, by a construction just like that used to prove the exist- 
ence of a set lacking the perfect set property. The weaker projective determinacy 
(PD) asserts only the determinateness of projective sets. But that is enough to 
imply (by much the same proof) that all of those have the perfect set property. 
PD similarly implies other regularity properties (including Lebesgue measur- 
ability). How far can we go in proving determinateness for Borel and projective 
sets? Here is the first step: 


Gale-Stewart Theorem All open sets are determinate. 


Proof-sketch: We continue working with the Cantor space, but the argument 
would be the same for the Baire space. Let A be an open subset of the Cantor 
space, meaning a union of basic sets U(s). Then if a play of the game results ina 
win for IN by generating an x in A, x will belong to some U(s) included in A. But 
it takes only finitely many rounds of picks by the two players to generate s, and 
that means that by some finite stage IN will have, in effect, already won. Now 
suppose there is no winning strategy for IN. Call a position after finitely many 
rounds in the game good for OUT if IN still does not have a winning strategy for 
the continuation of the game from that point. If a position is good, then whatever 
IN picks next, there is some pick for OUT that would keep the position good. 
(Otherwise, IN could make a pick i for which this is not so, and whatever pick 
OUT then made, IN would have a strategy for the rest of the game, meaning that 
IN in effect had a winning strategy already, namely, to pick this 7 and follow up 
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with the winning strategy associated with OUT’s next pick.) Therefore OUT can 
always make a pick that will keep the position good, so at no finite stage will IN 
have won, meaning that IN will not win in the end, either. Always picking to 
keep the position good is a winning strategy for OUT. 

David Blackwell, in a modest but insightful communication of 1967, pointed 
out how this theorem could be used to give a new game-theoretic proof of a 
classical structural theorem, the Kuratowski reduction principle for coanalytic 
sets. A flurry of intense activity led a decade or so later to the beautiful picture in 
Moschovakis (2009): PD implies not only all the regularity theorems but also all 
the structural theorems that the classical descriptive set theorists of the 1920s 
and 1930s were seeking but could not find because they are not provable in ZFC. 
Great interest consequently attaches to the question of how far one can go 
beyond the Gale—Stewart theorem within ZFC, and what plausible additional 
assumptions might take one all the way to PD. 

There were partial results on low-level Borel sets by Morton Davis and 
others, and then D. A. Martin proved the determinateness of analytic and 
coanalytic sets assuming a measurable cardinal. He later proved in ZFC the 
determinateness of all Borel sets. Martin (2020) is a recent recounting of both 
results. Between Martin’s two landmarks, Friedman (1971) showed, on the one 
hand, that analytic and coanalytic determinacy could not have been obtained in 
ZFC alone, and that Borel determinacy would have to make use of essentially 
the full strength of ZFC, and involve cardinals that, although not “large” by set 
theorists’ standards, are much larger than any ordinarily encountered in main- 
stream mathematics. Later Woodin, and definitively Martin and John Steel 
(1989), showed that enough really /arge large cardinals — more specifically, 
enough Woodin cardinals — give PD. Just as there are results about the finite 
whose proofs take a detour through the infinite (as discussed in §8.3), so also 
there are results about the “lower infinite, the realm of reals and sets of reals, 
whose proofs depend on the “higher” infinite, or realm of large cardinals. 

Unfortunately, although large cardinals in this way have cleared up almost all 
outstanding problems about special sets of reals, in contrast to their thus telling 
us “everything” about descriptive set theory, they in themselves tell us almost 
nothing about CH, disappointing the expectations of Gédel, for one. For if a 
model M has in it an ordinal that appears to it to be a large cardinal, then it will 
still appear to be so after forcing involving any partial order P that appears to the 
model of smaller cardinality that that. And this includes the forcing “condi- 
tions” that can turn CH off or on. This was shown by Levy, Solovay, and others — 
it has to be proved anew for each type of large cardinal — in the early days of the 
reception of Cohen’s work. Hence a continuing search on the part of set theorists 
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for other candidate new axioms to prove or disprove CH, with MM being one 
example. 


10.3 Large Cardinals and Inner Models 


Scott (1961) discovered an important break in the series of stronger and 
stronger large cardinal axioms. The weaker ones, such as the existence of 
inaccessibles and the existence of weakly compacts, are compatible with 
V=L: if LC is either of these axioms, then Con(ZFC + LC) implies 
Con(ZFC + LC + V =L). Thus “inaccessibles and weakly compacts can 
exist in L.” But Scott showed that measurables cannot. (The measurable 
cardinal « is present in L, since all ordinals are, but not its measure pt, so L 
does not “know” « is measurable.). But Kunen (1970) showed there exists an 
L-like minimalistic model associated with ZFC + MC, where MC is the 
existence of measurable cardinals, which as he and subsequent workers 
established, shares many of the properties of the original L, with CH being 
true in it, in particular. This discovery launched the inner model program, to 
find L-like models for larger and larger cardinals, and analyze their fine 
structures. The program has since made considerable progress, the main 
outstanding case being super-compacts. Woodin, pursuing this line of thought, 
has been led to a vision of an “ultimate L” and an hypotheses that would 
among other things accommodate all large cardinals (bar Reinhardt), and 
imply c =X. 

Woodin’s current leaning in this direction contrasts with an earlier leaning in 
the direction of a different hypothesis he called («) or “star” on account of the 
illumination it would cast on many problems, one of whose implications would 
be c =X». It is rather rare for any paper in set theory to appear in the Annals of 
Mathematics, generally accounted the leading journal in the field, but while the 
present work was in preparation there appeared there a paper of David Aspero 
and Ralf Schindler (2021) proving that a souped-up variant of MM implies («), 
thus uniting two lines of work leading to the same conclusion about the value 
of c. This result has been found so newsworthy that journalistic popularizations 
have appeared, which, while avoiding technicalities, attempt to convey some- 
thing of the spirit of the work through a mix of pictures, metaphors, and 
quotations attributed to experts. See Wachover (2021). 


11 Concluding Philosophical Remarks 


The methodological problem faced by set theorists concerns what to do about 
central set-theoretic questions such as CH whose status cannot be settled on the 
basis of the axioms generally accepted by the mathematical community, those of 
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ZFC. Here optimists hold that additional axioms that settle such questions can 
be, or perhaps already have been, found that can be justified in a way that is in 
some sense objective, either intrinsically by claiming that they really were 
implicit in our concept of set all along, or extrinsically in terms of the wealth 
and plausibility and utility of their consequences. By contrast, pessimists 
suspect we may have to acknowledge a bifurcation in the concept of set into a 
pair of distinct concepts, one leading to ¢ = Xj, one leading to ¢ = X> (a view 
sometimes compared to the Einsteinian bifurcation of the Newtonian notion of 
mass into rest mass and inertial mass, which makes some statements about 
“mass” true in one sense but false in another); or even worse, we may have to 
abandon altogether any attempt to settle such questions as the status of CH or 
SH, and so on, and embrace a “multiverse” of set-theoretic universes in different 
ones of which different combinations of such hypotheses hold, tracing out their 
interconnections (a view reminiscent of the speculations of those cosmologists 
who believe in a “multiverse” of different physical universes in which physical 
constants may have different values). There are also mathematicians unfriendly 
to set theory who hold that ZFC is already more than is needed to accommodate 
all really important core mathematics, and that speculations about abstruse set- 
theoretic issues going beyond this basic core should not be encouraged. 

The ontological problem raised by metaphysicians less interested in the 
internal workings of a discipline like set theory than in its external relations, 
so to speak, and especially in the relation of its objects to those of physical 
science, is this: Are abstract entities — of which sets would be a prime example — 
utlimately real, or merely useful fictions? Metaphysical or ontological realists 
opt for the former, and nominalists for the latter alternative. There are also 
philosophers unfriendly to ontological metaphysics who question the intelligi- 
bility of talk of “ultimate reality.” 

Although authors of Elements are permitted, and even almost encouraged, to 
be opinionated, I do not wish to take a stand on either question here. Obviously, 
since I have undertaken to write the present work, I do not share the anti-set- 
theoretic stance (while nonetheless finding great interest in questions about how 
much set theory this or that kind of mathematics indispensably requires and how 
much it can get by without), but I disclaim any qualifications to speak on 
optimism versus pessimism. I am firmly antimetaphysical or anti-ontological 
in outlook, but I have expressed my views on that issue as well as I can 
elsewhere, and it is, in any case an issue that has more to do with mathematics 
in general than set theory in particular, which is my present subject. 

What I think it important to emphasize about the optimism versus pessimism 
and the realism versus nominalism debates before closing here is simply their 
distinctness. Philosophy is not like sciences where international bodies regulate 
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the use of terminology (the names of chemical compounds or of biological 
species, for instance), and as a result one finds a philosophical label such as 
“platonist” bandied about rather loosely, sometimes applied to set-theoretic 
optimists, sometimes applied to metaphysical realists. This kind of sloppiness 
can sow confusion, and so though warnings against conflating orthogonal issues 
have repeatedly been urged by Maddy and others, one more repetition of such 
cautions may be in order. 

Let it be noted, therefore, first, that even in connection with a body of 
admitted pure fiction, say the canon of Sherlock Holmes stories of Conan 
Doyle, when presented with two continuations, say pastiches by two different 
subsequent writers, there are often objective (or anyhow, not wholly subjective) 
reasons for thinking one rather than the other more in harmony with the spirit of 
the original, and more worthy of being admitted as deuterocanonical rather than 
dismissed as apocryphal. In the same way, one could hold that “ultimate L” is, in 


5) 


some sense, objectively preferable to “Martin’s maximum,” or the reverse, 
while remaining doubtful whether in the end set theory is more than a grand 
mythology. 

Let it be noted, also, second, that even in connection with physical objects 
whose reality no sane person doubts, there can be many questions about them 
that it is beyond our powers to answer. (A stock example is: What did Julius 
Caesar have for his last meal before he was assassinated?) There may even be 
deep reasons of physical principle (connected with the second law of thermo- 
dynamics) why there must be many such unanswerable physical questions. In 
the same way, one could be a firm believer in the absolute, fundamental, 
noumenal reality of the objects of set theory, while holding that many of their 
properties are entirely and forever beyond the range of human cognitive 
faculties. 

Set theory is a grand subject, whether or not its oldest and deepest questions 
can ever be answered objectively, and whether it is in the end regarded as a 
revelation of an ultimate reality or as a purely human construction. 
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